PP16-lec5-arch4.ppt

download PP16-lec5-arch4.ppt

of 27

Transcript of PP16-lec5-arch4.ppt

  • 8/16/2019 PP16-lec5-arch4.ppt

    1/27

    1.1

    Parallel Processingsp2016

    lec#5

    Dr M Shamim Baig 

  • 8/16/2019 PP16-lec5-arch4.ppt

    2/27

    Explicitly Parallel Processorarchitectures:

     Task-level Parallelism

    1.2

  • 8/16/2019 PP16-lec5-arch4.ppt

    3/27

    1.3

    Elements of (Explicit) Parallel

    Architectures

    • Processor configurations:

      Instruction/Data Stream ase!

    • "emor $onfigurations:  % P&sical ' (ogical ase!

      % )ccess%Dela ase!

    • Inter%processor communication: %$ommunication%Interface !esign

      % Data *+c&ange/ Snc& approac&

  • 8/16/2019 PP16-lec5-arch4.ppt

    4/27

    1.,

    Parallel Platforms:"emor -P&sical s (ogical $onfigurations

    • P&sical s (ogical "emor $onfig

      P&sical "emor config -S" D" $S"

      (ogical )!!ress Space config -S)S S)S  $ominations

    • $S" S)S -S"P4 ")

    • D" S)S

     -DS"4 ")• D" S)D -"ulticomputer/$lusters

  • 8/16/2019 PP16-lec5-arch4.ppt

    5/27

    1.5

    S&are! memor -S" "ultiprocessor 

    • It is important to note !ifference eteenS&are! "emor ' S&are! )!!ress Space

    • 7ormer is p&sical memor config &ilelater is (ogical memor a!!ress ie for

    program.• It is possile to proi!e S&are! )!!ress Space using a p&sicall !istriute!memor.

    • S"%multiprocessors sstems are S)S%configusing p&sical memor configuration

      either as C S" or as -D"  DS"

  • 8/16/2019 PP16-lec5-arch4.ppt

    6/27

    1.6

    UMA vs NUMA

    • S"%multiprocessors are further categorized ase! onmemor access !ela as ") -uniform memor

    access ' ") -non uniform memor access

    • ") sstem is ase! on -C S" S)S config

    &ere eac& processor &as same !ela for

    accessing an memor location

    • ") sstem is ase! on -D"S)S 8 DS"

    config &ere a processor ma &ae !ifferent !ela for accessing !ifferent memor location.

  • 8/16/2019 PP16-lec5-arch4.ppt

    7/271.9

    UMA & NUMA Arch Block ia!rams

    "ypical share#$a##ress$space architectures: (a) Uniform$memory access share#$a##ress$space

    computer% () Uniform$memory$access share#$a##ress$space computer 'ith caches an# memories%(c) Non$uniform$memory$access share#$a##ress$space computer 'ith local memory only

    "

    "

    "

      ") -$S" S)S ") -D" S)S8 DS"

    ot& are S"%

    multiprocessors

    !iffering in"emor )ccess

    Dela format

  • 8/16/2019 PP16-lec5-arch4.ppt

    8/271.;

    Simplistic ie of a small s&are! memor

    Smmetric "ulti Processor -S"P:

    -$S" S)S us

    *+amples:

    • Dual Pentiums

  • 8/16/2019 PP16-lec5-arch4.ppt

    9/271.=

    interface

    I/> us

    Processor/memorus

    S&are! memor

  • 8/16/2019 PP16-lec5-arch4.ppt

    10/271.10

    "ulticomputer -$luster Platform$omplete computers P -$ P* ' D" it& S)S '

    interconnection netor? interface at I/> us leel.

    Processor 

    Interconnectionnetor?

    (ocal

    $omputers

    "essages

    memor

    • @&ese platforms comprise of a set of processors

    an! t&eir on -e+clusie/ !istriute! memor

    • Instances of suc& a ie come naturall from

    non%s&are!%a!!ress space -S)S

    multicomputers e.g  clustere! or?stations

  • 8/16/2019 PP16-lec5-arch4.ppt

    11/271.11

    Data *+c&ange/Snc& )pproac&es:S&are! !ata s "essage%Passing

    • @&ere are to primar approac&es of

    !ata e+c&ange/snc& in parallel sstems

       S&are! "emor "o!el

       "essage%Passing "o!el

    • S"%multiprocessors use S&are!%Data

    approac& for !ata e+c&ange/snc&.

    • "ulticomputers -$lusters use "essage%Passing approac& for !ata e+c&ange/

    snc&.

  • 8/16/2019 PP16-lec5-arch4.ppt

    12/271.12

    • S&are! memor platforms &ae lo comm

    oer&ea! can support loer grain leels

    &ile message passing platforms &ae more

    comm oer&ea! ' t&erefore are more suite!

    for coarse grain leels

    • S" "ultiprocessors are faster  ut &ae poor

    scalailit• "essage passing "ulticomputer platforms

    are sloer  ut &ae &ig&er scalailit.

    Data*+c&ange/Snc& Platforms:

    S&are!%memor s "essage%Passing

  • 8/16/2019 PP16-lec5-arch4.ppt

    13/271.13

    $lusters as a $omputing Platform

    • $lusters: ) netor? of computers ecame aer attractie alternatie to e+pensie

    supercomputers used for &ig&%performance

    computing in earl 1==0s

    • Seeral earl proAects notal:%

       )S) eoulf  proAect

      er?ele >B -netor? of or?stations

    proAect.

  • 8/16/2019 PP16-lec5-arch4.ppt

    14/271.1,

    eoulf $lustersC

    •  ) group of interconnected  commo!it computers ac&ieing &ig& performance it& lo cost.

    • @picall using commo!it interconnects e.g&ig& spee! *t&ernet ' >S e.g (inu+.

    C eoulf comes from name gien )S) o!!ar!Space 7lig&t $enter cluster proAect.

  • 8/16/2019 PP16-lec5-arch4.ppt

    15/271.15

     )!antages of $luster $omputer:

    ->B%li?e

    • Processing o!es are &ig& performance P$s/

    or?stations rea!il aailale at lo cost.

    • Interconnection of processing no!es using

    &ig& performance ()s/ S)s

    • *asil pgra!ale  incorporating latest

    processors into sstem as t&e become aailale

    • *asil scalale to igger ' more poerfulsstems

    • *+isting softare can e easil a!apte! for

    parallel e+ecution on $luster sstem

  • 8/16/2019 PP16-lec5-arch4.ppt

    16/271.16

    $luster Interconnects: () s S)

    • ()s : fast / its/ 10%its *t&ernet

    • S)s: "rinet S calls causing moreprocessing !elas

  • 8/16/2019 PP16-lec5-arch4.ppt

    17/271.19

    Fector/ )rra Data Processors

    • Fector proc:1D%@emporal parallelism using

    pipeline )rit& unit ' Fector c&aining  7loat a!! pipe: $omp e+p algn mant a!! mant ormaliGe

    •  )rra proc:1D% Spatial parallelism using )(%arra as SI"D

    • Sstolic )rra: comines 2%D spatial

    parallelism it& pipeline! -computationalaefront

    loc? Diagrams of Fector/arra ' Sstolic processing

    HHHHH

  • 8/16/2019 PP16-lec5-arch4.ppt

    18/271.1;

    Summar: Parallel Platforms4

     "emor ' Interconnect $onfigurations

    • "emor $onfig -P&sical s (ogical  P&sical "emor config -S" D" $S"

      (ogical )!!ress Space config -S)S S)S

      $ominations

    • $S" S)S -S"P4 ")

    • D" S)S -DS"4 ")

    • D" S)D -"ulticomputer/$lusters

    • Interconnection etor?:o Interface leel: memor us -using "* in S"%

    multiprocessors -") ") s I/> us -using I

    in multicomputer / cluster 

    o Data *+c&ange / snc:

      S&are! Data mo!el s "essage Passing mo!el

  • 8/16/2019 PP16-lec5-arch4.ppt

    19/27

    omeor?:

    self assesse! prolems

    Please mar? our solution ' note

    t&e mar?s ou ac&iee!HHHHHHH

    1.1=

  • 8/16/2019 PP16-lec5-arch4.ppt

    20/27

    Prolems:

    *+plicit Parallel )rc&itectures

    1.20

  • 8/16/2019 PP16-lec5-arch4.ppt

    21/271.21

    $onsi!er a S"%"ultiprocessor using

    32%it EIS$ processors running at 150

    "G carries out one instruction percloc? ccle. )ssume 15J !ata%loa! '

    10J !ata%store instructions using

    s&are! us &aing 2/sec B.$ompute "a+ numer of processors

    possile to connect on t&e aoe us

    for folloing parallel configurations:%

    *+ample Prolem1:

     us ase! S"%"ultiprocessor(imit of Parallelism

  • 8/16/2019 PP16-lec5-arch4.ppt

    22/271.22

    -a S"P -it&out cac&e memor

    - S"P it& cac&e memor&aing &it%ratio of =5J '

    memor rite%t&roug& polic

    -c ") it& program (ocalit

    factor 8 ;0 J

    *+ample Prolems:

     us ase! S"%"ultiprocessor:(imit of ParallelismK.cont’d 

  • 8/16/2019 PP16-lec5-arch4.ppt

    23/27

    1.23

    Bus$ase# interconnects (a) 'ith no local caches% () 'ith local memory,caches

    +ince much of the #ata accesse# y processors is local to the processor- a

    local memory can improve the performance of us$ase# machines Example..

    S"P -S" ' S&are! us I

  • 8/16/2019 PP16-lec5-arch4.ppt

    24/27

    1.2,

    UMA & NUMA Arch Block ia!rams

    "ypical share#$a##ress$space architectures: (a) Uniform$memory access share#$a##ress$space

    computer% () Uniform$memory$access share#$a##ress$space computer 'ith caches an# memories%

    (c) Non$uniform$memory$access share#$a##ress$space computer 'ith local memory only

    "

    "

    "

      ") -$S" S)S ") -D" S)S8 DS"

    ot& are S"%

    multiprocessors

    !iffering in

    "emor )ccess

    Dela format

  • 8/16/2019 PP16-lec5-arch4.ppt

    25/27

    omeor?:

    self assesse! prolem

    Please mar? our solution ' note

    t&e mar?s ou ac&iee!HHHHHHH

    1.25

    E l P l /

  • 8/16/2019 PP16-lec5-arch4.ppt

    26/27

    1.26

    Example Prolem/:

     Messa!e Passin! Multicomputer-

    *ocal vs 0emote memory #ata access #elays

    $onsi!er 6,%no!e multicomputer eac& no!e comprises of32%it EIS$ processor &aing 250 "G cloc? rate ' ; "

    local memor. @&e (ocal memor access reLuires , cloc?

    ccles remote comm initiate -setup oer&ea! is 15 cloc?

    ccles ' t&e Interconnection etor? B is ;0 "/sec.@otal numer of instructions e+ecute! are 200000.

    If memor !ata loa! ' store are 15J ' 10J respectiel

    of t&e instructions compute:%

    -a(oa!/ store time if all accesses are to local no!es-(oa!/ store time if 20J of accesses are to remote no!es

    note !ssume Packet lengths are variable "de#end on addr

    $ data b%tes& $ communication #rotocol given'''.

    Size of #acket fields is in multi#le of b%tes.

    E l P l / t’d

  • 8/16/2019 PP16-lec5-arch4.ppt

    27/27

    Processor 

    1nterconnection

    net'ork

    (ocal

    omputers

    Messa!es

    memor

    Example Prolem/ (cont’d 

     Messa!e Passin! Multicomputer-

    *ocal vs 0emote memory #ata access #elays