PP16-lec3-arch2
-
Upload
rohfollower -
Category
Documents
-
view
215 -
download
0
Transcript of PP16-lec3-arch2
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 1/23
1.1
Parallel Processingsp2016
lec#3
Dr M Shamim Baig
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 2/23
1.2
Implicit Parallel Architectures:
ILP processors
• Pipelined Processors
• Superscalar Processor
• LI! Processor
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 3/23
1.3
Pipeline Per"ormance
• Instruction Arithmetic$unit Pipeline
• Ideal pipeline Speed$up calculation Limits
• %hained Pipeline Per"ormance
• The speed-up of a pipeline is eventually limited by thenumber of stages & time of slowest stage.
• For this reason, onventional proessors tried on verydeep-pipeline !"# stage pipeline is an e$ample of deep pipeline ompared to normal pipeline of %- stages'
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 4/23
1.&
Pipeline Per"ormance 'ottlenec(s
• Pipeline has "ollo)ing per"ormance *ottlenec(s
+esource %onstraint
,ata ,ependenc-
'ranch Prediction
• (ppro$ every )-th instrution is a onditional *ump+ This
reuires very aurate branh predition.• The penalty of a predition error grows with the depth ofthe pipeline, sine a larger number of instrutions willhave to be flushed .
• ence need "or *etter solutions /than deep pipeline
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 5/23
1.
Implicit Parallel Architectures:
ILP processors
• Pipelined Processor • Superscalar Processor
• LI! Processor
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 6/23
1.6
Superscalar Processor
• ne simple )a- o" alle4iating the deep pipeline*ottlenec(s is to use multiple /concurrent short
pipelines.• Issue multiple independent instructions
simultaneousl- 5 7amples: 8IPS10009 Po)erP% Pentium
• he ;uestion then *ecomes one o" selecting orscheduling these instructions "or simultaneousissuing.
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 7/231.<
Superscalar Scheduler
• Superscalar scheduler is in$chip hard)are thatloo(s at num*er o" instructions in an instruction;ueue at runtime selects appropriate num*ero" instructions to e7ecute concurrentl-.
• Scheduling o" instructions concurrentl- isdetermined *- a num*er o" "actors: 5 esolve Data Dependeny ssues
5 esolve esoure /onstraint ssues
5 esolve Branh 0redition ssues
• %ost= comple7it- o" Scheduler hard)are itsper"ormance constraints /discussed later areimportant issues o" superscalar processors.
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 8/231.>
7ample: t)o$)a- superscalar e7ecution o" instructions
IF ID NA NA WB
he e7ample illustrates that di""erent instruction mi7es )ith
identical semantics can ta(e signi"icantl- di""erent e7ecution time
== ? not re;uired
== ? not re;uired
7ecution @nit constraint or data$dependenc- can cause additional dela-s than Ideal pipeline
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 9/231.
Superscalar 7ecution: +esource !aste• In the a*o4e e7ample9 there is some )astage o" 7ecution
unit resource
== ? not re;uired
== ? not re;uired IF ID NA NA WB
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 10/231.10
Superscalar 7ecution:
""icienc- %onsiderations
• Bot all "unctional units can *e (ept *us- at all times.
• I" during a c-cle9 no "unctional units are utiliCed9 this is
re"erred to as 4ertical )aste.
• I" during a c-cle9 onl- some o" the "unctional units are
utiliCed9 this is re"erred to as horiContal )aste.
• ,ue to limited parallelism in t-pical instruction traces
/dependencies limited time=scope o" the scheduler
to e7tract parallelism9 the per"ormance o" superscalar
processors is e4entuall- limited.
• %on4entional microprocessors t-picall- support "our$
)a- superscalar e7ecution.
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 11/231.11
Superscalar 7ecution:
Instruction Issue 8echanisms
• In the simpler model9 instructions can *e issuedonl- in the order in )hich the- are encountered
i.e i" the second instruction cannot *e issued
*ecause it has a data dependenc- )ith the "irst9
onl- one instruction is issued in the c-cle.
his is called in-order issue.
• In a more aggressi4e model9 instructions can *e
issued out o" order . In this case9 i" the second
instruction has data dependencies )ith the "irst9
*ut the third instruction does not9 the "irst andthird instructions can *e co$scheduled.
his is also called d-namic issue.
• Per"ormance o" in$order issue is generall- limited
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 12/231.12
Implicit Parallel Architectures:
ILP processors
• Pipelined Processor • Superscalar Processor
• LI! Processor
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 13/231.13
er- Long Instruction !ord /LI!
Processors• Hardware cost /complexity time/ scope constraint
of runtime scheduling of the superscalar are the
major issues in superscalar design.
• o address these issues9 LI! processors rel- on
compile time anal-sis to identi"- *undle together
instructions that can *e e7ecuted concurrentl-
• These instructions are pac(ed dispatched together
thus the name 4er- long instruction )ord
• -pical LI! processors are limited to & to >$)a-parallelism. ariants o" this concept are emplo-edin Intel (1 processors T TMS%"# /222 DS0s
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 14/23
TMS%"#/$ has dual data paths
& orthogonal instrution units
whih boost overall performane
( high performane DS03
4-way 567 proessor
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 15/231.1
%omparison: Superscalar 4s
er- Long Instruction !ord /LI!
• Superscalar implements Scheduler as in$chip ard)are9)hile LI! implements it in compiler so"t)are.
• Superscalar schedules concurrent instructions at runtime9)hile LI! does it at compile$time.
• Superscalar scheduler scope is limited to "e) instructions"rom instruction$;ueue )hile LI! scheduler has *iggerconte7t !may be full program ' to process.
• ,ue to more time conte7t LI! scheduler can usemore po)er"ul algorithms !eg loop unrolling, branh predition
et' gi4ing *etter results9 )hich Superscalar canDt a""ord
• %ompilers9 however 9 do not ha4e runtime in"ormation !eg
ahe misses, branh variable state et', so LI! Scheduling isinherentl- more conser4ati4e than Superscalar
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 16/23
Explicitly Parallel Processorarchitectures:
Tas8-level 0arallelism
1.16
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 17/23
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 18/231.1>
?l-nnDs %lassi"ication "or
Parallel Processor Architecture• Instruction Stream ,ata Streams *ased
classi"ication /SIS,9 8IS,9 SI8,9 8I8,• Processing units in parallel computers either
operate under the centraliCed control o" asingle control unit or )or( independentl-.
• I" there is a single control unit that dispatchesthe same instruction to 4arious processors/that )or( on di""erent data9 the model isre"erred to as single instruction stream9multiple data stream /SI8,.
• I" each processor has its o)n control unit9each processor can e7ecute di""erentinstructions on di""erent data items. his modelis called multiple instruction stream9 multiple
data stream /8I8,.
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 19/231.1
SI8, and 8I8, Processors
A t-pical SI8, architecture /a and a t-pical 8I8, architecture /*.
8 6 8 3 + E
8 6
8 3 + E
IS
,S1
,S2
,S3
,Sn$1
,Sn
,S1
,S2
,Sn$1
,Sn
IS1
IS2
Isn$1
ISn
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 20/231.20
SI8, Processors
• Some o" the earliest parallel computers such as the
Illiac I9 8PP9 ,AP9 %8$29 and 8asPar 8P$1 *elonged tothis class o" machines.
• ariants o" this concept ha4e "ound use in co$processing
units such as the 88F units in Intel processors9 ,SP
chips such as the Sharc i4idiaDs GP@s.• SI8, relies on the regular structure o" computations /such
as those in image processing.
• It is o"ten necessar- to selecti4el- turn o"" operations on
certain data items. ?or this reason9 most SI8,programming paradigms allo) "or an HHacti4it- mas(9
)hich determines i" a processor should participate in a
computation or not.
% diti l ti i SI8, P
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 21/231.21
7: %onditional 7ecution in SI8, Processors
Executing a conditional statement on an !"# computer with four processors:
(a) the conditional statement$ (%) the execution of the statement in two steps.
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 22/231.22
Programing 8odels: 8P8,= SP8,
• In contrast to SI8, processors9 8I8, processors cane7ecute di""erent programs on di""erent processors
• here are t)o programming models "or PP called
8ultiple=Single Program 8ultiple$,ata /8P8,= SP8,
e$eute di""erent=same program on di""erent processors• SI8, supports onl- SP8, model. Although 8I8,
supports both models o" programming /8P8, SP8,9
SP8, is pre"erred choice due to so"t)are management
• 7amples o" 8I8,$plat"orms inlude current generation
Sun @ltra Ser4ers9 SGI rigin Ser4ers9 multiprocessor
P%s9 )or(station clusters I'8 SP.
8/16/2019 PP16-lec3-arch2
http://slidepdf.com/reader/full/pp16-lec3-arch2 23/23
1 23
%omparison: SI8, 4s 8I8,• %ontrol "lo):
S-nchronous in SI8, 4s As-nchronous in 8I8,
• Programming$model:SI8, supports onl- SP8, prog-model
while 8I8, supports *oth /SP8, 8P8, prog-models
• %ost: SI8, computers re;uire less hard)are than
8I8, computers /single control unit.
5 o)e4er9 since SI8, processors are speciall-
designed9 the- tend to *e e7pensi4e and ha4e long
design c-cles.
5 In contrast9 8I8, processors can *e *uilt "rom
ine7pensi4e o""$the$shel" components )ith relati4el-little e""ort in a short time
• ?le7i*ilit-: SI8, per"orm 4er- )ell "or specialiCed =
regular applications *ut Bot "or all applications9 )hile
8I8, are more "le7i*le general purpose.