Curs 1 - Universitatea Babeş-Bolyaivniculescu/didactic/PPD/C1.pdf · – Hardware, retele, SO,...

Post on 20-Mar-2018

226 views 1 download

Transcript of Curs 1 - Universitatea Babeş-Bolyaivniculescu/didactic/PPD/C1.pdf · – Hardware, retele, SO,...

Curs1

ProgramareParalelasiDistribuita

Curs 1 - PPD - 2

Con4nutulcursului(realizatsipebazapeh:p://grid.cs.gsu.edu/~tcpp/curriculum/?q=home)

Teore%c•  No4uniintroduc4ve:arhitecturi,

concurenta,paralelism•  Etapeindezvoltareaprogramelorparalele•  Evaluareaperformanteiprogramelor

paralele•  Modeledeprogramareparalela

–  Diferentaintrecelebazatepememoriepartajatasimemoriedistribuita

•  Pa#erns–  Ptprogramareparalela–  Ptprogramaredistribuita

Prac%c•  Javathreads(lowlevelAPI)•  C++(C++11)threads•  High-levelAPI:pacheteJava->

java.u4l.concurrentpackages.•  Javastreams•  OpenMP(C++)•  CUDA(C++)•  MPI–MessagePassingInterface

–  exemplificariC,C++

Curs 1 - PPD - 3

Bibliografie•  IanFoster.DesigningandBuildingParallelPrograms,Addison-Wesley1995.•  BernaL.Massingill,TimothyG.Ma:son,andBeverlyA.Sanders,AddisonAPa:ernLanguagefor

ParallelProgramming.WesleySodwarePa:ernsSeries,2004.•  MichaelMcCool,ArchRobinson,JamesReinders,StructuredParallelProgramming:Pa:ernsfor

EfficientComputa4on,”MorganKaufmann,,2012.•  D.Culler,J.PalSingh,A.Gupta.ParallelComputerArchitecture:AHardware/SodwareApproach.

MorganKaufmann.1998.•  Grama,A.Gupta,G.Karypis,V.Kumar.Introduc4ontoParallelCompu4ng,AddisonWesley,2003.•  D.Grigoras.CalcululParalel.Delasistemelaprogramareaaplica4ilor.ComputerLibrisAgora,2000.•  V.Niculescu.CalculParalel.Proiectaresidezvoltareformalaaprogramelorparalele.PresaUniv.

Clujana,2006.•  B.Wilkinson,M.Allen,ParallelProgrammingTechniquesandApplica4onsUsingNetworked

Worksta4onsandParallelComputers,Pren4ceHall,2002•  A.Williams.C++ConcurrencyinAc4onPRACTICALMULTITHREADING.ManningPublisher.2012.•  TutorialeJava:h:p://docs.oracle.com/javase/tutorial/essen4al/concurrency/further.html•  C++11h:p://en.cppreference.com/w/•  OpenMP:h:p://openmp.org/•  MPI:h:p://www.mpi-forum.org/

Curs 1 - PPD - 4

Evaluare

•  Laborator(25%)–  Exerci4ilaborator–  ProgramefolosindJava/C++threads,OpenMP,MPI

•  Seminar(10%)–  prezenta,par4cipareac4va

•  Testeprac4ce–3X10%•  Examen

–  Scris– sesiune35%

•  Informa4icurs–  h:p://www.cs.ubbcluj.ro/~vniculescu/didac4c/PPD/CursPPD.html

5Curs 1 - PPD -

ProcesareParalela

•  Uncalculatorparalelesteuncalculator(sistem)carefolosestemul%pleelementedeprocesaresimultanaintr-omanieracoopera%vapentruarezolvaoproblemacomputa%onala.

•  ProcesareaParalelaincludetehnicisitehnologiicarefacposibilcalcululinparalel–  Hardware,retele,SO,biblioteci,limbaje,compilatoare,algoritmi…

•  Paralelismulestenatural.

•  PERFORMANTA–  Parallelismisverymuchaboutperformance!

6Curs 1 - PPD -

Curs 1 - PPD - 7

CalculSerialvs.Paralel(imagesfromIntroduc%ontoParallelCompu%ngBlaiseBarney)

Itwouldappearthatwehavereachedthelimitsofwhatitispossibletoachievewithcomputertechnology,althoughoneshouldbecarefulwithsuchstatements,astheytendtosoundpre9ysillyin5years.(JohnvonNeumann,1949)

Curs 1 - PPD - 8

Curs 1 - PPD - 9

Limitealeprogramariiseriale•  Vitezadetransmisie–

•  Vitezaluminii(30cm/nanosecond),limitadetransmisiepefirdecupru(9cm/nanosecond).

•  Limitareaminiaturizarii–numardetrazistoripechip.–  LegealuiMoore:număruldetranzistori

carepotfiplasa4peussingurcircuitintegrat(persquareinchchip)sedubleazalafiecare2ani.

–  Darimpunecosturimari.

•  Limitarieconomice

Istoric

•  CrestereaperformanteiprocesorprincrestereafrecventeiceasuluiCPU(CPUclockfrequency)–  RidingMoore’slaw

•  Probleme:incalzireaputernicaachipurilor!–  Frecventaceasmaimare⇒consumelectricmaimare(Pen:um4heatsink¦FryinganeggonaPen:um4)

•  Solu4e–adugaremaimultorcore-uriptaajungelaperformantadorita–  Sepastreazafrecventadeceaslafelsauchiarmicsorare–  nucresteconsumul.

Curs 1 - PPD - 10

Nivelurideparalelism

1.paralelismlaniveldejob:-intrejoburi;-intrefazealejoburilor;2.paralelismlaniveldeprogram:-intrepărţialeprogramului;-inanumitecicluri;3.paralelismlaniveldeinstrucţiune:

-intrediferitefazedeexecuţiealeuneiinstrucţiuni;4.paralelismlanivelaritme%cşilaniveldebit:-  intreelementealeuneioperaţiivectoriale;-  intrecircuitelelogiciiaritme4ce.

Curs 1 - PPD - 11

•  Arhitecturilecurentesebazeazatotmaimultpeparalelismlanivelhardwarepentruaimbunata4performanta:

–  Mul4pleexecu4onunits–  Pipelinedinstruc4ons–  Mul4-core

Curs 1 - PPD - 12

Paralelism<->Concurenta

•  Considerammaimultetaskuricaretrebuieexecutatepeuncalculator•  Taskurileseconsideraafipurparaleledaca:

–  Sepotexecutainacelasi4mp(parallelexecu:on)•  Rezultacanuexistadependenteintreele;

•  Dependente-execu4econcurenta:–  Untaskarenevoiederezultatelealtora;–  Untasktrebuiesaseexecutedupaceoanumitacondi4eeindeplinita–  Maimultetaskuriincearcasafoloseascaaceeasiresursa

=>Formedesincronizaretrebuiefolositepentruasa4sfacedependentele;

•  Concurentaestefundamentalaincomputerscience–  Sistemedeoperare,bazededate,networking,…

13Curs 1 - PPD -

Paralelismvs.Concurenta

14 Curs 1 - PPD -

Obs:–  Sepotfolosithreadurisauproceseinambelecazuri–  Dacauncalculparalelnecesitaacceslaresursecomuneatunci

estenevoiesaseges4onezecorectconcurenta=>Paralelismulpoateimplicaconcurenta

Paralelism:Sefolosescmaimulteresursepentruarezolvaoproblemamairapid

resources

Concurenta:Ges4uneacorectasieficientaaaccesuluilaresursecomune

requestswork

resource

ConcurentasiParalelism

•  Concurent<>paralel!•  Execu4eParalela:

–  Taskurileseexecutaefec4vinacelasi4mp;

–  Estenecesaraexistentademul4pleresursedecalcul

•  Paralelism=concurenta+hardware“paralel”

15Curs 1 - PPD -

Paralelism

•  Existamaimultenivelurideparalelism:–  Procese,threads,rou4ne,instruc4uni,…

•  Trebuiesafiesuportatederesurselehardware–  Procesoare,cores,…(execu4ainstruc4unilor)–  Memorii,DMA,retele,…(opera4iasociate)

16Curs 1 - PPD -

Decesafolosimprogramareparalela?

•  Mo4veprimare:–  Timpdecalculmairapid(response:me)–  Rezolvareaprobelemelor‘mari’decalcul(in4mprezonabildecalcul)

•  Mo4vesecundare:–  Folosireaefec4vaaresurselordecalcul–  Costurireduse–  Reducereaconstrangerilorasociatememoriei–  Limitarilemasinilorseriale

•  Paralelism=concurenta+parallelHardware+performanta

17Curs 1 - PPD -

Curs 1 - PPD - 18

•  Rezolvareaproblemelordificile,mari:–  "GrandChallenge"(en.wikipedia.org/wiki/Grand_Challenge)problemsrequiring

PetaFLOPSandPetaBytesofcompu4ngresources.–  Websearchengines/databasesprocessingmillionsoftransac4onspersecond

•  Folosirearesurselornon-locale:

–  SETI@home(se4athome.berkeley.edu)usesover330,000computersforacomputepowerover528TeraFLOPS(asofAugust04,2008)

–  Folding@home(folding.stanford.edu)usesover340,000computersforacomputepowerof4.2PetaFLOPS(asofNovember4,2008)

Direc4iinprocesareaparalela

•  Arhitecturiparalele–  Necesita4Hardware–  Computersystemdesign

•  Sistemedeoperare(Paralelism/concurenta)•  Ges4onareaaspectelorsistemptuncalculatorparalel•  Programareparalela

–  Biblioteci(low-level,high-level)–  Limbaje–  Mediidedezv.–  Sodware

•  AlgoritmiParaleli•  Evaluareaperformanteiprogramelorparalele•  Testareavs.asigurareacorec4tudinii•  Paralleltools:

–  Performanta,analize,vizualizare,…

19Curs 1 - PPD -

Decesastudiemprogramareparalela?

•  Arhitecturidecalcul–  Inova4ileconduclanoimodeledeprogramare

•  Convergentatehnologica–  “killermicro”estepestetot–  Laptop-urilesisupercomputeresuntfundamentalsimilare–  Trend-urileactualeconduclaconvergentaabordarilordiverse

•  Treduriletehnologicefaccalcululparalelinevitabil–  Mul4-coreprocessors!–  Acumoricesistemdecalculesteparalel

•  Intelegereaprincipiilorfundamentale–  Programare,comunica4i,memorie,…–  Performanta

•  “Parallelismisthefutureofcompu%ng”-BlaiseBarney–  M.Andrews,J.S.Walicki.“Concurrencyandparallelism—futureofcompu4ng”in

ProceedingofACM'85Proceedingsofthe1985ACMannualconferenceonTherangeofcompu4ng:mid-80'sperspec4ve.pp.224-231.

20Curs 1 - PPD -

InevitabilitateaProcesariiParalele

•  Cerinteleptaplica4i–  Necesitateauriasadecicluridecalcul

•  Trenduritehnologice–  Procesaresimemorie

•  TrenduriArchitecturale•  Factorieconomici•  Treduriactuale:

–  Today’smicroprocessorshavemul:processorsupport–  Serversandworksta:onsavailableasmul:processors–  Tomorrow’smicroprocessorsaremul:processors–  Mul:-coreisheretostayand#cores/processorisgrowing–  Accelerators(GPUs,gamingsystems)

21Curs 1 - PPD -

Aplica4iinforma4ceperformante

•  Performantaaplica4ilorimpunehardwareperformant(rapid,resursemul4ple,etc)

•  Hardware-ulavansatgenereazanoiaplica4i•  Noileaplica4iaucererideperformantamaimari

–  Crestereexponen4alaaperformanteimicroprocesoarelor–  Inova4iinarhitecturileparalelesiinintegrare

•  Cerintedeperformanta=>–  Performantasistemelortrebuiesaseimbunatateascainansamblu–  Schimbari/abordari/reevaluariinSodwareengineering–  Costuri-technologieavansata

applica4onsperformance

hardware

22Curs 1 - PPD -

Curs 1 - PPD - 23

Programareparalelavs.Programaredistribuita

INTERCONNECTION NETWORK

P2

P3

P1

P4 P5

Pn . . . .

Curs 1 - PPD - 24

TiPURI DE MULTIPROCESARE PARALLEL DISTRIBUTED

ASPECTE TEHNICE • PARALLEL COMPUTERS (- IN MOD UZUAL ) LUCREAZA BAZAT PE

•  CUPLARE STRANSA, •  in general bazate pe SINCRONICITATE, •  CU UN SISTEM DE COMUNICATIE FOARTE RAPID SI FIABIL •  Spatiu unic de adresare (intr-o masura mare)

•  DISTRIBUTED COMPUTERS •  MAI INDEPENDENTE, •  COMUNICATIE MAI PUTIN FRECVENTA SI mai putin RAPIDA (ASINCRONA) •  COOPERARE LIMITATA •  NU EXISTA CEAS GLOBAL • “Independent failures”

SCOPURI

•  PARALLEL COMPUTERS COOPEREAZA PENTRU A REZOLVA MAI EFICIENT PROBLEME DIFICILE

•  DISTRIBUTED COMPUTERS AU SCOPURI INDIVIDUALE SI ACTIVITATI PRIVATE. DOAR UNEORI INTERCOMUNICAREA ESTE NECESARA

PARALLEL COMPUTERS: COOPERARE IN SENS POSITIV

DISTRIBUTED COMPUTERS: COOPERARE IN SENS NEGATIV, DOAR ATUNCI CAND ESTE NECESARA

Curs 1 - PPD - 25

Aplicatii paralele Suntem interesati sa rezolvam problemele mai rapid in paralel

Aplicatii distribuite

Suntem interesati sa rezolvam anumite probleme specifice :

• COMMUNICATION SERVICES ROUTING BROADCASTING

• MAINTENANCE OF CONTROL STUCTURE TOPOLOGY UPDATE LEADER ELECTION • RESOURCE CONTROL ACTIVITIES LOAD BALANCING MANAGING GLOBAL DIRECTORIES

Ingeneral…

Curs 1 - PPD - 26

Parallelv.s.DistributedSystems(fromM.FUKUDACSS434SystemModels)

Parallel Systems Distributed Systems

Memory Tightly coupled shared memory UMA, NUMA

Distributed memory Message passing, RPC, and/or used of distributed shared memory

Control Global clock control SIMD, MIMD

No global clock control Synchronization algorithms needed

Processor interconnection

Order of Tbps Bus, mesh, tree, mesh of tree, and hypercube (-related) network

Order of Gbps Ethernet(bus), token ring and SCI (ring), myrinet(switching network)

Main focus Performance Ex. - Scientific computing

Performance(cost and scalability) Reliability/availability Information/resource sharing

Unpunctdevedere…

Curs 1 - PPD - 27

SistemeleDistribuite

-potfifolositepentru-

•  Aplica4idistribuiteimplicit–  BDDistribuite,rezervaribilieteavion/etc.sistembancar

•  Informa4ipartajateintreuseri•  Partajareresurse•  Raportcost/performantamaibunptaplica4iparalele

–  Potfifolositeeficientpt.aplica4icugranularitatemare(coarse-grained)si/sauptaplica4iparalelede4pembarrassinglyparallelapplica:ons

•  Fiabilitate(Reliability).•  Scalabilitate

–  Cuplareslaba(Looselycoupledconnec4on);hotplug-in

•  Flexibilitate–  Reconfiguraresistemptaintrunicerintele

Curs 1 - PPD - 28

Performanta/Scalabilitate

Spredeosebiredesistemeleparaleleceledistribuiteimplica:-  interven4asistemuluideoperare-  mediumaipu4nrapiddetransferaldatelor(reteamaipu4nrapida)-  HeterogenitateSolu4i:-Procesarebatchamesajelor:

•  Seevitainterven4aSOptfiecaretransferdemesaj.–  Cachedata

•  Seevitarepetareatransferuluiaceleiasidate–  Evitateaen4ta4lorsiaalgoritmilorcentraliza4

•  Evitareasaturariiretelei–  Realizareopera4i“post”lanivelulclientului

•  Evitareatraficuluiintensintreclien4siservere

–  ….

Curs 1 - PPD - 29

Securitate

•  Nuexistadoarunsingurpunctdecontrol•  Probleme:

–  Mesaje,furate,modificate,copiate,…•  Solu4e:folosireCriptografie

–  Failures•  FaultTolerancesolu4ons

Tipuridesistemedistribuite(Munehiro Fukuda – PDP Fundamentals)

•  Modele–  Minicomputer–  Worksta4on–  Worksta4on-server–  Processor-pool–  Cluster–  Gridcompu4ng

Curs 1 - PPD - 30

31

ModelulMinicomputer

•  ExtensieasistemuluideTimesharing–  Useriitrebuiesaseloghezepepropriulminicomputer.–  Apoiselogheazalaaltamasina(remotemachine)prinprogramede

4ptelnet.•  Partajareresurse

–  DB–  High-performancedevices

Mini-computer

Mini-computer

Mini-computer

net

Curs 1 - PPD -

Curs 1 - PPD - 32

ModelulWorksta4on

•  MigrareProcese–  Useriiselogheazamaiintaipesta4adelucrupersonala;–  Dacaexistasta4i“inasteptare”–unjob“mare”poatemigrala

unadintreele.•  Probleme:

–  Cumseiden4ficasta4ile“inasteptare”(idle)?–  Cummigreazaunjob?–  Ceseintampladacaunaltuserselogheazapemasinafolosita?

100MbpsLAN

Worksta4on

Worksta4on Worksta4on

Worksta4onWorksta4on

Curs 1 - PPD - 33

ModelulWorksta4on-Server

•  Sta4iClient–  Aplica4ileGrafice/interac4veseproceseazalocal–  Ptaltecereridecalculsetrimitcererilaservere.

•  Servere(minicomputers)–  Fiecareserverestededicatunuiasaumaimultor

4purideservicii.•  Modeldecomunicare:Client-Servermodel

–  RPC(RemoteProcedureCall)–  RMI(RemoteMethodInvoca4on)

•  Unprocesclientcheamaofunc4eaprocesuluiserver.

•  Nusefacemigraredeprocese•  Examplu:NFS

100GbpsLAN

Worksta4on

Worksta4on Worksta4on

Mini-Computerfileserver

Mini-ComputerhNpserver

Mini-Computercycleserver

Curs 1 - PPD - 34

ModelulProcessor-Pool

•  Clien4i:–  Selogheazalaunterminal–  Toateserviciilesuntges4onatede

catreservere.•  Servere:

–  Ptfiecareusersealocanrnecesardeprocesoaredinpool.

•  U4lizarebunadarinterac4vitateslaba.

Server1

100MbpsLAN

ServerN

Curs 1 - PPD - 35

ClusterModel

•  ConstainmaimultePC/worksta4onsconectatelaoreteade4phigh-speed.

•  Focuspeperformanta.100Mbps

LAN

Worksta4on

Worksta4on Worksta4on

Masternode

Slave1

SlaveN

Slave2

1GbpsSAN

h:pserver1

h:pserver2

h:pserverN

Curs 1 - PPD - 36

High-speedInforma4onhighway

GridCompu4ng

•  Scop–  Colectareaputeridecalculamaimultor

supercomputeresiclusteredispersategeografic

•  DistributedSupercompu:ng–  Ptproblemwfoartemari.Dificile.

(CPUintensive,memoryintensive).•  High-ThroughputCompu4ng

–  Folosireamultorresursecarenusuntfolosite

•  On-DemandCompu4ng–  Resurseladistantaintegrateincalculullocal

•  Data-intensiveCompu4ng–  distributeddata

•  Collabora4veCompu4ng–  Suportptcomunicareintremaimultepar4

Super-computer

Cluster

Super-computer Cluster

Mini-computer

Worksta4on

Worksta4on Worksta4on