PP16-lec3-arch2

8/16/2019 PP16-lec3-arch2

http://slidepdf.com/reader/full/pp16-lec3-arch2 1/23

1.1

Parallel Processingsp2016

lec#3

Dr M Shamim Baig

8/16/2019 PP16-lec3-arch2


1.2

Implicit Parallel Architectures:

ILP processors

• Pipelined Processors

• Superscalar Processor

• LI! Processor

8/16/2019 PP16-lec3-arch2


1.3

Pipeline Per"ormance

• Instruction Arithmetic$unit Pipeline

• Ideal pipeline Speed$up calculation Limits

• %hained Pipeline Per"ormance

• The speed-up of a pipeline is eventually limited by thenumber of stages & time of slowest stage.

• For this reason, onventional proessors tried on verydeep-pipeline !"# stage pipeline is an e$ample of deep pipeline ompared to normal pipeline of %- stages'

8/16/2019 PP16-lec3-arch2


1.&

Pipeline Per"ormance 'ottlenec(s

• Pipeline has "ollo)ing per"ormance *ottlenec(s

+esource %onstraint

,ata ,ependenc-

'ranch Prediction

• (ppro$ every )-th instrution is a onditional *ump+ This

reuires very aurate branh predition.• The penalty of a predition error grows with the depth ofthe pipeline, sine a larger number of instrutions willhave to be flushed .

• ence need "or *etter solutions /than deep pipeline

8/16/2019 PP16-lec3-arch2


1.


ILP processors

• Pipelined Processor • Superscalar Processor

• LI! Processor

8/16/2019 PP16-lec3-arch2


1.6

Superscalar Processor

• ne simple )a- o" alle4iating the deep pipeline*ottlenec(s is to use multiple /concurrent short

pipelines.• Issue multiple independent instructions

simultaneousl- 5 7amples: 8IPS10009 Po)erP% Pentium

• he ;uestion then *ecomes one o" selecting orscheduling these instructions "or simultaneousissuing.

8/16/2019 PP16-lec3-arch2

http://slidepdf.com/reader/full/pp16-lec3-arch2 7/231.<

Superscalar Scheduler

• Superscalar scheduler is in$chip hard)are thatloo(s at num*er o" instructions in an instruction;ueue at runtime selects appropriate num*ero" instructions to e7ecute concurrentl-.

• Scheduling o" instructions concurrentl- isdetermined *- a num*er o" "actors: 5 esolve Data Dependeny ssues

5 esolve esoure /onstraint ssues

5 esolve Branh 0redition ssues

• %ost= comple7it- o" Scheduler hard)are itsper"ormance constraints /discussed later areimportant issues o" superscalar processors.

8/16/2019 PP16-lec3-arch2

http://slidepdf.com/reader/full/pp16-lec3-arch2 8/231.>

7ample: t)o$)a- superscalar e7ecution o" instructions

IF ID NA NA WB

he e7ample illustrates that di""erent instruction mi7es )ith

identical semantics can ta(e signi"icantl- di""erent e7ecution time

== ? not re;uired

== ? not re;uired

7ecution @nit constraint or data$dependenc- can cause additional dela-s than Ideal pipeline

8/16/2019 PP16-lec3-arch2

http://slidepdf.com/reader/full/pp16-lec3-arch2 9/231.

Superscalar 7ecution: +esource !aste• In the a*o4e e7ample9 there is some )astage o" 7ecution

unit resource

== ? not re;uired

== ? not re;uired IF ID NA NA WB

8/16/2019 PP16-lec3-arch2

http://slidepdf.com/reader/full/pp16-lec3-arch2 10/231.10

Superscalar 7ecution:

""icienc- %onsiderations

• Bot all "unctional units can *e (ept *us- at all times.

• I" during a c-cle9 no "unctional units are utiliCed9 this is

re"erred to as 4ertical )aste.

• I" during a c-cle9 onl- some o" the "unctional units are

utiliCed9 this is re"erred to as horiContal )aste.

• ,ue to limited parallelism in t-pical instruction traces

/dependencies limited time=scope o" the scheduler

to e7tract parallelism9 the per"ormance o" superscalar

processors is e4entuall- limited.

• %on4entional microprocessors t-picall- support "our$

)a- superscalar e7ecution.

8/16/2019 PP16-lec3-arch2


Superscalar 7ecution:

Instruction Issue 8echanisms

• In the simpler model9 instructions can *e issuedonl- in the order in )hich the- are encountered

i.e i" the second instruction cannot *e issued

*ecause it has a data dependenc- )ith the "irst9

onl- one instruction is issued in the c-cle.

his is called in-order issue.

• In a more aggressi4e model9 instructions can *e

issued out o" order . In this case9 i" the second

instruction has data dependencies )ith the "irst9

*ut the third instruction does not9 the "irst andthird instructions can *e co$scheduled.

his is also called d-namic issue.

• Per"ormance o" in$order issue is generall- limited

8/16/2019 PP16-lec3-arch2



ILP processors

• Pipelined Processor • Superscalar Processor

• LI! Processor

8/16/2019 PP16-lec3-arch2


er- Long Instruction !ord /LI!

Processors• Hardware cost /complexity time/ scope constraint

of runtime scheduling of the superscalar are the

major issues in superscalar design.

• o address these issues9 LI! processors rel- on

compile time anal-sis to identi"- *undle together

instructions that can *e e7ecuted concurrentl-

• These instructions are pac(ed dispatched together

thus the name 4er- long instruction )ord

• -pical LI! processors are limited to & to >$)a-parallelism. ariants o" this concept are emplo-edin Intel (1 processors T TMS%"# /222 DS0s

8/16/2019 PP16-lec3-arch2


TMS%"#/$ has dual data paths

& orthogonal instrution units

whih boost overall performane

( high performane DS03

4-way 567 proessor

8/16/2019 PP16-lec3-arch2


%omparison: Superscalar 4s

er- Long Instruction !ord /LI!

• Superscalar implements Scheduler as in$chip ard)are9)hile LI! implements it in compiler so"t)are.

• Superscalar schedules concurrent instructions at runtime9)hile LI! does it at compile$time.

• Superscalar scheduler scope is limited to "e) instructions"rom instruction$;ueue )hile LI! scheduler has *iggerconte7t !may be full program ' to process.

• ,ue to more time conte7t LI! scheduler can usemore po)er"ul algorithms !eg loop unrolling, branh predition

et' gi4ing *etter results9 )hich Superscalar canDt a""ord

• %ompilers9 however 9 do not ha4e runtime in"ormation !eg

ahe misses, branh variable state et', so LI! Scheduling isinherentl- more conser4ati4e than Superscalar

8/16/2019 PP16-lec3-arch2


Explicitly Parallel Processorarchitectures:

Tas8-level 0arallelism

1.16

8/16/2019 PP16-lec3-arch2


8/16/2019 PP16-lec3-arch2

http://slidepdf.com/reader/full/pp16-lec3-arch2 18/231.1>

?l-nnDs %lassi"ication "or

Parallel Processor Architecture• Instruction Stream ,ata Streams *ased

classi"ication /SIS,9 8IS,9 SI8,9 8I8,• Processing units in parallel computers either

operate under the centraliCed control o" asingle control unit or )or( independentl-.

• I" there is a single control unit that dispatchesthe same instruction to 4arious processors/that )or( on di""erent data9 the model isre"erred to as single instruction stream9multiple data stream /SI8,.

• I" each processor has its o)n control unit9each processor can e7ecute di""erentinstructions on di""erent data items. his modelis called multiple instruction stream9 multiple

data stream /8I8,.

8/16/2019 PP16-lec3-arch2


SI8, and 8I8, Processors

A t-pical SI8, architecture /a and a t-pical 8I8, architecture /*.

8 6 8 3 + E

8 6

8 3 + E

IS

,S1

,S2

,S3

,Sn$1

,Sn

,S1

,S2

,Sn$1

,Sn

IS1

IS2

Isn$1

ISn

8/16/2019 PP16-lec3-arch2


SI8, Processors

• Some o" the earliest parallel computers such as the

Illiac I9 8PP9 ,AP9 %8$29 and 8asPar 8P$1 *elonged tothis class o" machines.

• ariants o" this concept ha4e "ound use in co$processing

units such as the 88F units in Intel processors9 ,SP

chips such as the Sharc i4idiaDs GP@s.• SI8, relies on the regular structure o" computations /such

as those in image processing.

• It is o"ten necessar- to selecti4el- turn o"" operations on

certain data items. ?or this reason9 most SI8,programming paradigms allo) "or an HHacti4it- mas(9

)hich determines i" a processor should participate in a

computation or not.

% diti l ti i SI8, P

8/16/2019 PP16-lec3-arch2


7: %onditional 7ecution in SI8, Processors

Executing a conditional statement on an !"# computer with four processors:

(a) the conditional statement$ (%) the execution of the statement in two steps.

8/16/2019 PP16-lec3-arch2


Programing 8odels: 8P8,= SP8,

• In contrast to SI8, processors9 8I8, processors cane7ecute di""erent programs on di""erent processors

• here are t)o programming models "or PP called

8ultiple=Single Program 8ultiple$,ata /8P8,= SP8,

e$eute di""erent=same program on di""erent processors• SI8, supports onl- SP8, model. Although 8I8,

supports both models o" programming /8P8, SP8,9

SP8, is pre"erred choice due to so"t)are management

• 7amples o" 8I8,$plat"orms inlude current generation

Sun @ltra Ser4ers9 SGI rigin Ser4ers9 multiprocessor

P%s9 )or(station clusters I'8 SP.

8/16/2019 PP16-lec3-arch2


1 23

%omparison: SI8, 4s 8I8,• %ontrol "lo):

S-nchronous in SI8, 4s As-nchronous in 8I8,

• Programming$model:SI8, supports onl- SP8, prog-model

while 8I8, supports *oth /SP8, 8P8, prog-models

• %ost: SI8, computers re;uire less hard)are than

8I8, computers /single control unit.

5 o)e4er9 since SI8, processors are speciall-

designed9 the- tend to *e e7pensi4e and ha4e long

design c-cles.

5 In contrast9 8I8, processors can *e *uilt "rom

ine7pensi4e o""$the$shel" components )ith relati4el-little e""ort in a short time

• ?le7i*ilit-: SI8, per"orm 4er- )ell "or specialiCed =

regular applications *ut Bot "or all applications9 )hile

8I8, are more "le7i*le general purpose.

PP16-lec3-arch2

Documents

Transcript of PP16-lec3-arch2