Scalarea Aplicatiilor Web - 2009

40
Scalarea Aplicatiilor Web Andrei Gheorghe idevelop.ro

Transcript of Scalarea Aplicatiilor Web - 2009

Scalarea Aplicatiilor Web

Andrei Gheorgheidevelop.ro

Cazul cel mai comun

Shared Hosting

Unde apar probleme

• Puterea de procesare a serverului:

CPU, RAM, etc

• Latimea de banda

• Capacitate de stocare

• Baza de date

Server Web + Server DB

Load Balancing

Load Balancing

• Hardware• Balancingul se face la nivel de transport pachete• Scump, nu stie nimic despre arhitectura aplicatiei

• DNS Load Distribution ("Round Robin")• Statistic, distribuie traficul uniform• Nu stie nimic despre disponibilitatea serverelor• Pot aparea probleme de DNS caching• Este o solutie doar la scara foarte mare

• Reverse Proxy

Reverse Proxy Load Balancing

• Un singur front-end pentru mai multe servere

• Securitate

• Accelerarea cererilor SSL

• Caching

• nginx, squid, lighthttpd

Relational Databases

tabele, coloane, joinuri

MySQL Replication

MySQL Cluster

• Data node– Nu se interactioneaza direct cu ele

• Management node– Configurarea si monitorizarea clusterului

• SQL node (mysqld process): – Un server MySQL care se conecteaza la nodurile de date

pentru a cere sau stoca informatii

• Generally, each node will run on a separate host

MySQL Cluster

• Synchronous Replication– Datele sunt replicate pe mai multe noduri pentru a asigura

disponibilitatea in cazul deconectarii unui nod de date

• Horizontal Data Partitioning– Informatiile sunt partitionate automat intre toate nodurile de

date folosind un algoritm bazat pe primary key

• Hybrid Storage– memory / disk

• Shared Nothing– “no single point of failure“

Normalizare

• Presupune aducerea bazei de date la o

“forma normala”

• Datele sunt structurate pe tabele cu relatii

intre ele, si fiecare informatie apare o

singura data

• Asigura consistenta informatiei in cazul

operatiilor asupra bazei de date

Normalizare / Denormalizare

� USERSuser_id, user_name, user_password

� POSTSpost_id, post_author_id

� COMMENTSc_id, c_post_id, c_text

Normalizare / Denormalizare

� USERSuser_id, user_name, user_password

� POSTSpost_id, post_author_id, post_author_name

� COMMENTSc_id, c_post_id, c_text

Normalizare / Denormalizare

� USERSuser_id, user_name, user_password

� POSTSpost_id, post_author_id, post_author_name, post_comment_count

� COMMENTSc_id, c_post_id, c_text

Key→ Value Databases

Key→ Value Databases

• Distributed, persistent hash tables• "Eventual consistency"

• Permit SELECT-uri cu conditii

• Necesita o doza de denormalizare a datelor• Tratarea manuala a inconsistentelor, propagarea datelor

corecte

• MemcacheDB, CouchDB, Amazon SimpleDB, Hypertable, Google BigTable

Sharding

Vertical Sharding

• Un server pentru useri, un server pentru search, etc

• JOIN-urile intre tabele se fac manual• Denormalizarea DB reduce nevoia de JOIN-uri

USERS COMMENTS SEARCH

Horizontal Sharding

• Impartirea inregistrarilor dintr-un tabel intre mai multe servere

• Algoritmul de impartire este foarte important• in functie de algoritmul ales, reechilibrarea

datelor in cazul modificarii topologiei poate fi dificila

• Se poate folosi un dictionar central• algoritm transparent• mai usor de reechilibrat• poate crea SPF

USR #1

USR #2

USR #3

Avantajele sharding-ului

• High availability.• Daca un server crapa, aplicatia continua sa functioneze

• Query-uri mai rapide• Query-urile fiind pe bucati mai mici de date se executa

mai repede

• Rata de scriere mai mare• Scrierile se executa mai repede deoarece, neavand un

server central, se executa in paralel

Cache

memcached

memcached -d -u www -m 2048 -l 10.0.0.8 -p 11211

• Hash table distribuit, pastrat in RAM

� set(key, value)� get(key)� delete(key)

• value este de obicei un intreg obiect serializat• Ex: articol + comentarii + informatii autor

• Exista clase de interactiune cu memcached pentru orice limbaj de programare, inclusiv PHP

memcached

• "Least Recently Used"

• Intr-o retea cu mai multe servere, instantele de memcached pot fi legate intre ele pentru a forma un cluster memcache in care cache-ul este replicat pe mai multe noduri

• memcached ruleaza pe Linux, Windows, poate fi pornit oriunde exista RAM liber

Session Clustering

Load Balancing Revisited

Session Clustering

• Store in common filesystem• Not useful in multi-server environments• NFS will cache pages

• Store in database• Very fast because you are only ever looking up primary keys• Make sure the DB has row locking (InnoDB), not table locking.

• Store in memcached• Stored across several machines rather than just one.• A total machine failure now affects only a percentage of users

rather than everyone.

Content Delivery Network

• A collection of web servers distributed across

multiple locations to deliver static content more

efficiently to users.

• The server selected for delivering content to a

specific user is typically based on a measure of

network proximity.

Multiple Codebases

• Daca arhitectura serverelor si a site-ului o

permite, se pot face lucruri interesante avand

cod diferit

• Folosind un reverse proxy, se pot trimite 10%

din vizitatori spre o versiune 2.0 beta a site-ului

si observa felul cum interactioneaza

• Daca lucrurile nu ies cum ar trebui, se revine la

codul initial si nu au fost afectati decat 10%

Studii de cazhighscalability.com

LAMP

Shards

Memcached

Squid

Smarty

Imagemagick

• More than 4 billion queries per day

• ~35M photos in squid cache (total)

• ~2M photos in squid’s RAM

• ~470M photos, 4 or 5 sizes of each

• 38k req/sec to memcached (12M objects)

• 2 PB raw storage (consumed about ~1.5TB on Sunday

• Over 400,000 photos being added every day

• Debian Linux, Apache, PHP, MySQL

• memcached

• MemcacheDB - distributed key-value storage

system which conforms to memcache protocol

→15,000 writes/second, 64,000 reads/second

• Lots of servers

• 26 million uniques a month

• 30 million users.

• Uniques are only half that traffic. Traffic =

unique web visitors + APIs + Digg buttons.

• 2 billion requests a month

• 13,000 requests a second, peak at 27,000

requests a second.

• Data are separated into separate clusters: User

Actions, Users, Comments, Items, etc.

• Asynchronous queuing architecture for near-

term processing

Amazon Web Services

Simple Storage Service (S3)

• Cloud storage service

• Servere in US / Europe

• REST API

• Stocare: $0.150 / GB

• Upload: $0.100 / GB

• Download: $0.170 / GB

• Twitter foloseste S3 pentru pozele userilor

Elastic Compute Cloud (EC2)

• On-demand server instances

• In 5 minute poti porni un server la care ai acces

root

• $0.10 / ora, 99.95% uptime garantat

– 4 ore pe an downtime

• Se pot aloca adrese IP statice si se pot construi

arhitecturi complexe

• Acces rapid la S3

SimpleDB

• Distributed hash DB

• Permite SELECT-uri cu conditii

• Query limitat la 5 secunde

thank you, come again