Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes...

41
Frédéric Brégier - LaBRI 1 Extensions du langage HPF pour la mise Extensions du langage HPF pour la mise en œuvre de programmes parallèles en œuvre de programmes parallèles manipulant des structures de données manipulant des structures de données irrégulières irrégulières Frédéric Brégier Thèse présentée à l’Université de Bordeaux I 21 Décembre 1999

Transcript of Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes...

Page 1: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 1

Extensions du langage HPF pour la mise en œuvre de Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de programmes parallèles manipulant des structures de

données irrégulièresdonnées irrégulières

Extensions du langage HPF pour la mise en œuvre de Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de programmes parallèles manipulant des structures de

données irrégulièresdonnées irrégulièresFrédéric Brégier

Thèse présentée à l’Université de Bordeaux I

21 Décembre 1999

Page 2: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 2

Frame of Work

•Parallel program by compilation

•HPF: standard for Data-parallel programs (regular programs)

•Need investments for irregular programs: poor efficiencies

•Optimizations at compile-time

•Optimizations at run-time (generated at compile-time)

Page 3: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 3

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with irregular loop-carried dependencies

•New data-parallel irregular operation: progressive irregular

prefix operation

•Conclusion and Perspectives

Page 4: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 4

IF (B(I) is local) THEN Send(B(J) to Owner(A(I))) END IFIF (A(I) is local) THEN Receive(in TMP from Owner(B(J))) A(I) = TMP + XEND IF

A

B

X Y

HPF (High Performance Fortran):HPF (High Performance Fortran): data-parallel languageMay 1993 HPF 1.0, January 1997 HPF 2.0

• Fortran 95 source code + structured comments (!HPF$) (distributions + parallel properties)

• Target Code : SPMD parallel code

•« Owner computes » rule• Runtime guards and communication generations

A(I) = B(J) + X

A A AB B B

X Y X Y X Y

Page 5: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 5

Optimizations at compile-timeLoop iteration space

•Affine expression

•Local loop bounds

•Not optimizable

!HPF$ INDEPENDENTDO I = 1, N A(I) = A(I) + 1END DO

! Cyclic Distribution caseDO I = PID+1, N, NOP A(I) = A(I) + 1END DO

! Block Distribution case (N dividable by NOP)LB = BLOC * PID + 1UB = min(N, LB+BLOC)DO I = LB, UB A(I) = A(I) + 1END DO

! Indirect distributionDO I = 1, N IF (A(I) is local) THEN A(I) = A(I) + 1 END IFEND DO

•Irregular = « what is not regular », not optimizable

Page 6: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 6

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with irregular loop-carried dependencies

•New data-parallel irregular operation: progressive irregular

prefix operation

•Conclusion and Perspectives

Page 7: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 7

Irregular Data Structure (IDS)

•Standard irregular format: indirect access arrays, example CSCI II III IV V VI VII VIII

12345678

1 3 5 6 9 12 16 18 21

1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8

1

1 5

3

2 5

A(1,1) DA(JA(1)) (IA(JA(1)) = 1)A(6,4) DA(JA(4)+1) (IA(JA(4)+1) = 6)A(:,4) DA(JA(4):JA(5)-1)

JA(1:9)

IA(1:20)

DA(1:20) = Non zero values of A

•Irregular distribution formats:

!HPF$ DISTRIBUTE JA(BLOCK) !HPF$ DISTRIBUTE IA(GEN_BLOCK(/5, 10, 5/))

Page 8: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 8

Problems at compile-time

•Distribution : unknown alignment between arrays of the IDS

•Data accesses: unknown indexes (indirection)

1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8

1 3 5 6 9 12 16 18 21

DA(JA(4)+1) JA(4) = ?

6

6

•Implies additional run-time guards and communications•Inefficient SPMD code

JA(1:9)

DA(1:20)

Page 9: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 9

Related Works

•Regular to Irregular Compilation•Bik et Wijshoff : « Sparse Compiler »

•Sparse Matrix with known topology•Regular analysis + known topology•IDS chosen by the compiler

•Pingali et al.•Relational description (between components and access functions)•Non standard and difficult notations

•Compilation of irregular programs•Vienna Fortran Compilation System: SPARSE directive

•Storage format specification•Limited to storage formats known by the compiler

Page 10: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 10

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with irregular loop-carried dependencies

•New data-parallel irregular operation: progressive irregular

prefix operation

•Conclusion and Perspectives

Page 11: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 11

I II III IV V VI VII VIII

1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8

The Tree: a generic data structure with hierarchical access

•From a data to a tree:I II III IV V VI VII VIII

12345678

•Representation in HPF2: derived data type of Fortran 95type level2 integer ROW !row number real VAL !non zero valueend type level2

type level1 type (level2), pointer :: COL(:) !columnend type level1

type (level1), allocatable :: A(:) !matrix with a hierarchical access by column

!HPF$ TREE

Tree Matrix CSC

A(i)%COL(j)%VAL A(j,i) DA(JA(i)+j-1)

A(i)%COL(:)%VAL A(:,i) DA(JA(i):JA(i+1)-1)

Page 12: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 12

Distribution of a TREE

I II III IV V VI VII VIII

1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8

!HPF$ DISTRIBUTE A(BLOCK)!HPF$ DISTRIBUTE A(INDIRECT(/1,2,3,2,1,2,3,1/))

Page 13: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 13

Example of improvement!HPF$ DISTRIBUTE A(BLOCK)

!HPF$ INDEPENDENT FORALL (I = 3:N-2) A(I)%COL(:)%VAL = A(I-2)%COL(:)%VAL + A(I+2)%COL(:)%VAL END FORALL

!HPF$ DISTRIBUTE DA(GEN_BLOCK(array))!HPF$ INDEPENDENT FORALL (I = 3:N-2) DA(IA(I):IA(I+1)-1) = DA(IA(I-2):IA(I-1)-1) + DA(IA(I+2):IA(I+3)-1) END FORALL

TMP(:) = Global Copy with BCAST(DA(:))DO I = 3, N-2 local_bound(DA(IA(I):IA(I+1)-1), lb, ub) DO J = lb, ub DA(J) = TMP(J1)+TMP(J2) END DOEND DOIA(I-2) = ?? : IA(I-1)-1 = ??

Communications on frontiers onlyAs SHADOW in HPF2

Global Copy+Bcast of DA

local_bound(A(:), lb, ub)TMP(lb:ub) = Local Copy of Local Part(A(lb:ub))Shadow_Update(TMP(:), -2,+2)local_bound(A(3:N-2), lb, ub)DO I = lb, ub A(I)%COL(:)%VAL = TMP(I-2)%COL(:)%VAL + TMP(I+2)%COL(:)%VALEND DO

Page 14: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 14

Arrays

DALIB

MPI

Trees/Derived Types

DALIB TriDenT

MPI

I II III IV V VI VII VIII

1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8

I II III IV V VI VII VIII

1 5 2 5 3 4 6 8 1 2 5 4 6 7 8 6 7 4 6 8

Page 15: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 15

Serial Product

0

10

20

30

40

50

60

70

80

90

Tim

es in

sec

onds

F90 Derived Type

F90 ADAPTORMatrix (F77)

F90 ADAPTORTriDenTM

atri

x V

ecto

r P

rod

uct

Parallel Product (dense notations)

70

80

90

100

processors (1-16)

Rel

ativ

e E

ffic

ienc

ies

%

HPF2/Matrix

HPF2/TREE

IBM SP2-LaBRI4096x4096

Page 16: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 16

•Advantages:

•Less indirections

•Less unknown alignments

•Better compile-time analysis (locality and dependence)

•Generic (defined by the user)

•Low overhead

•Disadvantages:

•Not necessary implemented in HPF compilers: portability

•Need to rewrite irregular code (with derived types)

Page 17: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 17

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with irregular loop-carried dependencies

•New data-parallel irregular operation: progressive irregular

prefix operation

•Conclusion and Perspectives

Page 18: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 18

Inspection-Execution

Inspection: scan the program to analyze in order to get useful informationExecution: execute the true computations according to the optimized scheme induced by the inspected information

DO I = 1, N A(I) = B(INDEX(I))END DOModify B

DO I = 1, N if (A(I) is local) then Add INDEX(I) to local_index end ifEND DOExchange info on local_index (what indexes to send, to receive)

Gather (B(local_index(:)) into Copy_B)I_local = 1DO I = 1, N if (A(I) is local) then A(I) = Copy_B(I_local) I_local = I_local + 1 end ifEND DOModify B

DO STEP = 1, S

END DO

DO STEP = 1, S

END DO

INSPECTION

EXECUTION

often iterative schemes

Related works:

•PARTI: iterative scheme•CHAOS: iterative and adaptive scheme (by steps)

Integrated in Fortran D and Vienna Fortran Compilation System

•PILAR: iterative and multi-phase scheme, basic element = sectionCompiler PARADIGM

•ADAPTOR: directive TRACE, dynamic adaptive scheme

Page 19: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 19

•ON HOME Directive: to control the computation mapping

!HPF$ ALIGN (I) WITH A(I) :: B, C

!HPF$ INDEPENDENT DO I = 1, N

C(INDEX(I)) = A(I) * B(I) END DO

DO I = 1, N-1 if (A(I) is local) then call Send(A(I) to Owner( C(INDEX(I)) )) call Send(B(I) to Owner( C(INDEX(I)) )) end if if (C(INDEX(I)) is local) then call Receive(TMP1 from Owner( A(I) )) call Receive(TMP2 from Owner( A(I) )) C(INDEX(I)) = TMP1 * TMP2 end ifEND DO

DO I = 1, N-1 if (A(I) is local) then TMP = A(I) * B(I) call Send(TMP to Owner( C(INDEX(I)) )) end if if (C(INDEX(I)) is local) then call Receive(TMP from Owner( A(I) )) C(INDEX(I)) = TMP end ifEND DO

!HPF$ ON HOME (A(I))

HPF2: communication optimizations with active processor sets

Page 20: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 20

Irregular Active Processor Sets

I II III IV V VI VII VIII

12345678

ON HOME A(1,I) + ON HOME A(1,V)ON HOME A(2,II) + ON HOME A(2,V)ON HOME A(3,III)

•Less active processors in collective communications•Less communications (reduction or broadcast)•Less synchronizations

Extensions to the ON HOME directive:!HPF$ ON HOME (A(K,:)) !HPF$ ON HOME (A(K,INDEX(K))

FORALL(J=I:VIII, J .eq. K .or. A(K,J) .ne. 0.0)!HPF$ ON HOME (A(K,J), J=I:VIII, J .eq. K .or. A(K,J) .ne. 0.0)

IIIIIIIVVVIVIIVIII

A B

!HPF$ ALIGN A(*,K) with B(K) B(K) = Sum(A(K,:))

Page 21: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 21

I II III IV V VI VII VIII

12345678

Cholesky Example: TREE and Set (Matrix with 65024 columns)

DO K = 1, N

allocate (TMP(N)) TMP(:) = 0.0

DO J = 1, K-1 IF (A(K,J) .ne. 0.0) THEN CMOD (TMP, A(:,J)) END IF END DO A(:,K) = A(:,K) + TMP(:) CDIV (A(:,K))

END DO

!HPF$ INDEPENDENT, REDUCTION (TMP(:))

!HPF$ ON HOME (A(K,J), J = 1:K, J.eq.K .or. A(K,J) .ne. 0.0), NEW(TMP), BEGIN

!HPF$ END ON

20

40

60

80

100

120

140

160

180

200

1 2 4 8 16Processors

Tim

es in

sec

onds

V0Vset

IBM SP2-LaBRI2D-Grid 255x255

Page 22: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 22

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with irregular loop-carried dependencies

•New data-parallel irregular operation: progressive irregular

prefix operation

•Conclusion and Perspectives

Page 23: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 23

Irregular Iteration Space!HPF$ INDEPENDENT, REDUCTION(B) DO J = 1, K-1 IF (A(K,J) .ne. 0.0) THEN … END IF END DO

!HPF$ DISTRIBUTE A(:,BLOCK)

Cholesky

15

35

55

75

95

115

135

155

175

195

1 2 4 8 16Processors

Tim

es in

sec

onds

VsetVset+Loop

IBM SP2-LaBRI2D-Grid 255x255

Page 24: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 24

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with partial loop-carried dependencies

•New data-parallel irregular operation: progressive irregular

prefix operation

•Conclusion and Perspectives

Page 25: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 25

Loop with Partial Loop-Carried Dependencies

•Loop-carried dependencies:DO I = 1, N DO J = 1, I-1 A(I) = A(I) + A(J) END DOEND DO

•Partial loop-carried dependencies:DO I = 1, N DO J = 1, I-1 IF (TEST(I,J)) THEN A(I) = A(I) + A(J) END IF END DOEND DO

•Precomputable partial loop-carried dependencies: PPLD LoopTEST never modified

Page 26: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 26

PPLD Loop

DO I = 1, N

B = 0.0!HPF$ INDEPENDENT, REDUCTION(B) DO J = 1, I-1 IF (TEST(I,J)) THEN B = B + A(J) END IF END DO A(I) = A(I) + B

END DO

Steps P1 P2 P3 P41 1 1 1 12 2 2 2 23 3 3 3 34 4 4 4 45 5 5 5 56 6 6 6 67 7 7 7 78 8 8 8 89 9 9 9 910 10 10 10 1011 11 11 11 11

Steps P1 P2 P3 P41 1 42 2 23 3 34 5 6 6 55 7 86 9 97 10 10 10 108 11 11

I Owner (A(I)) TEST(I,J) = TRUE1 1 -2 2 13 3 14 4 -5 1 1 46 2 2 37 3 38 4 49 1 1 4 5 810 2 4 5 7 911 3 6 7

!HPF$ ON HOME (A(J), J=I .or. TEST(I,J))

!HPF$ END ON

Set(I)P1

P1 P2P1 P3

P4P1 P4P2 P3

P3P4

P1 P4P1 P2 P3 P4

P2 P3

4

4

Page 27: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 27

PPLD Loop Scheduling

•Associates one iteration with one task•Precomputable Partial Loop-Carried Dependencies = task graph•Scheduling problem: HPF context

•Known mapping (HPF data distribution => task mapping)•Data distribution => possible multi-processor tasks

•« Scheduling multi-processor tasks on dedicated processors »

Related Work:•Complexity: Drozdowski 97, Krämer 95: NP-Hard Problem

•Wennink 95: Scheduling algorithm

•PYRROS / RAPID libraries: precomputable task graph with mono-

processor tasks (inspection-execution)

Page 28: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 28

Scheduling Tasks Associated to a PPLD Loop

1) DAG GenerationNew SCHEDULE directive

2) SchedulingSimple and Wennink’s scheduling

3) ExecutionStatic execution / Dynamic executionSingle thread / Multi-thread execution

4) Experimental Results

Page 29: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 29

1011

976

8532

SCHEDULE directive

Dependencies between iterations (inspection-execution):

DO I = 1, N!HPF$ ON HOME (A(J), J=I .or. TEST(I,J)) B = 0.0!HPF$ INDEPENDENT, REDUCTION(B) DO J = 1, I-1 IF (TEST(I,J)) THEN B = B + A(J) END IF END DO A(I) = A(I) + B!HPF$ END ON END DO

!HPF$ SCHEDULE (J = 1:I-1, TEST(I,J) )I TEST(I,J) = TRUE1 -2 13 14 -5 1 46 2 37 38 49 1 4 5 810 4 5 7 911 6 7

1 4

1011

976

8532

1 4

Page 30: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 30

Distributed Scheduling Algorithms

•Simple Scheduling: local tasks only

1011

976

8532

1 4a d

a b a c a d d

c a db c

b c a b c d10

9

532

1

1 2 3 5 9 10

Steps P1 P2 P3 P41 1 42 2 2 83 3 34 5 6 6 55 9 7 96 10 10 10 107 11 11

Order in task scheduling: priority criteria based on critical path

1

2

333

4

123334

1

2 3 5

9

10

2 3 5

Problem of scheduling coherence between processors: prevent deadlockBy step scheduling algorithm

List for task execution

Page 31: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 31

Scheduling•Wennink’s Scheduling: multi-processor tasks + insertion principle

1 2 3 5 9 10Simple:

Wennink: 1 23 5 9 102

Steps P1 P2 P3 P41 1 42 3 3 83 2 2 74 5 6 6 55 9 11 11 96 10 10 10 10

Complexity: Simple WenninkComputations O(N log N) O(N²)Memory O(|E|) O(N² + |E|)

1011

976

8532

1 4

Page 32: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 32

Static execution / Dynamic execution•HPF context: task costs not known at compile-time => unit costs•Static Critical Path = longest path (in edges) to the virtual « End » vertex

1011

976

8532

1 4

1

2

33

4

1

22

3 3

4

1 2 3 5 9 10

2

3

4

6 10 11

6 7 10 11

8 5 9 10

Static Scheduling: static order of execution

a

b

c

d

•Iterative program: first iteration records times, then re-scheduling Dynamic Scheduling

1 2 3 5 9 10

2

3

4

6 10 11

6 7 10 11

8 5 9 101011

976

8532

1 4

t10

t9

t5t3

t1

t11

t7t6

t2 t8

t4

1 3 2 5 9 10

2

3

4

6 11 10

7 6 11 10

8 5 9 10

E

Page 33: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 33

Single Thread / MultiThread execution

0

1 2

•2 independent tasks on the same processor•Same priority: which task first ?

•Single Thread: the lower rank first

•MultiThread: both

•User mode thread system: Marcel from PM² HighPerf

ComputationsWaiting for communicationCommunications

Task K

Task K’

Task K

Task K’

Overlapping communicationsby computations

Page 34: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 34

Experimental Results: Matrix with 261121 columns

•Cholesky on sparse matrix with column-block access•Irregular data structure: TREE•Distribution: INDIRECT (minimizing communications)

•VSet: V0 + Set•Stat: VSet+SCHEDULE (static simple scheduling)•Dyn: VSet+SCHEDULE (dynamic simple scheduling)•Stat_th: Stat + Threads•W: VSet+SCHEDULE (dynamic Wennink’s scheduling)

Relative efficiencies (global time)

30

50

70

90

110

130

150

1 2 4 8 16

% v

s V

set

VsetStatDynStat_thW

Relative Efficiencies (Re-execution only)

90

110

130

150

170

190

210

1 2 4 8 16

% v

s V

set

VsetStatDynStat_thW

IBM SP2-LaBRI2D-Grid 511x511

Page 35: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 35

Plan

•Optimizations at compile-time

•Irregular Data Structure (IDS)

•A Tree to represent an IDS

•Optimizations at run-time

•Inspection-Execution principles

•Irregular communications: irregular active processor sets

•Irregular iteration spaces

•Scheduling of loops with partial loop-carried dependencies

•New data-parallel irregular operation: progressive

irregular prefix operation

•Conclusion and Perspectives

Page 36: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 36

Irregular Progressive PREFIX Operation

•Irregular Progressive PREFIX Operation: found in PPLD Loop

],1[)( iBavecXgfX iBk

ki

i

•Irregular Coefficient:

%,1

i

Baverage i

ni

•Exploit independencies with specific communication schemes

Page 37: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 37

6

4 5

3

Irregular Progressive PREFIX Operation

1 2

Asynchronous communication

Synchronous REDUCTION

6

5

3

1 2

4

Page 38: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 38

Irregular Progressive PREFIX Operation

PREFIX directive/clause: differs from REDUCTION clause

DO I = 1, N

DO J = I+1, N IF (TEST(J,I)) THEN A(J) = A(J) + A(I) END IF END DOEND DO

DO I = 1, N B = 0.0

DO J = 1, I-1 IF (TEST(I,J)) THEN B = B + A(J) END IF END DO A(I) = A(I) + BEND DO

!HPF$ INDEPENDENT, REDUCTION(B)!HPF$ INDEPENDENT, PREFIX(B)

!HPF$ PREFIX(B)

Inspection(A,TEST)DO I = lb, ub (ON HOME A(I)) Finalize(A(I)) (receive contributions prev. send) DO J = I+1, N IF (TEST(J,I)) THEN A’(J) = A’(J) + A(I) (send when ready) END IF END DOEND DO

DO I = 1, N (Set(I)) B = 0.0 DO J = lb, ub (ON HOME A(J)) IF (TEST(I,J)) THEN B = B+ A(J) END IF END DO A(I) = A(I) + REDUCTION(B)END DO

Comparisons: PREFIX vs REDUCTION

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

2

Irregular Coefficient (TEST)

Equality

PREFIX vs REDUCTION

IBM SP2-LaBRI

Page 39: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 39

Irregular Progressive PREFIX Operation: Cholesky ExampleIrregular coef. = 0.1%

Global Time

75

125

175

225

275

325

375

425

475

1 2 4 8 16processors

Tim

es v

ersu

s V

1 (V

1/T

) (%

) Vset

VsetP

Stat

StatP

PaSTiX

Re-Execution Time

100

150

200

250

300

350

400

1 2 4 8 16processors

Tim

es v

ersu

s V

1 (V

1/T

) (%

)

Vset

VsetP

Stat

StatP

PaSTiX

IBM SP2-LaBRI2D-Grid 511x511

Page 40: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 40

Conclusion•TREE: Irregular Data Structure, more information at compile-time

Locality and dependence analysis => TriDenT

•Inspection/Execution: Still information not known at compile-time=> CoLUMBO

•Irregular Active Processor Sets: fundamental inspection/executionUp to a factor of 10

•Irregular Iteration Space: minor improvement

•Loop with Partial Loop Carried Dependencies:•DAG associated with loop iterations•Semi-automatic task scheduling at run-time•PREFIX operation

•Inspection costs repayed with only one iteration

•Experimental Results: Efficiency close to hand-made codes (time ratio between 1.25 and 2.5)

Page 41: Frédéric Brégier - LaBRI1 Extensions du langage HPF pour la mise en œuvre de programmes parallèles manipulant des structures de données irrégulières Frédéric.

Frédéric Brégier - LaBRI 41

Perspectives

•Integration in a HPF compiler: preliminary experiments

•TREE: ADAPTOR•Set inspection/execution, PREFIX inspection/execution:

NESTOR (Silber 98)

•Transposition to other parallel languages:

•Irregular Data Structures: always a problem => TREE

•Irregular iteration space

•OpenMP: Virtual shared memory => Data distribution

Irregular active processor sets