Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

32
cn - fhe - jun 94-1 CERN Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994 Frédéric Hemmer Computing & Networks Division CERN, Geneva, switzerland

description

Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994. Frédéric Hemmer Computing & Networks Division CERN, Geneva, switzerland. CERN - The European Laboratory for Particle Physics. Fundamental research in particle physics Designs, builds & operates large accelerators - PowerPoint PPT Presentation

Transcript of Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

Page 1: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-1

CERN

Analyse de Physique sur machines RISC : expériences

au CERN

SACLAY

20 JUIN 1994

Frédéric HemmerComputing & Networks Division

CERN, Geneva, switzerland

Page 2: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-2

CERN

CERN - The European Laboratory for Particle Physics

• Fundamental research in particle physics

• Designs, builds & operates large accelerators

• Financed by 19 European countries

• SFR 950M budget -operation + new accelerators

• 3,000 staff

• Experiments conducted by a small number of large collaborations:

400 physicists, 50 institutes, 18 countriesusing experimental apparatus costing 100s of MSFR

Page 3: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-3

CERN

Computing at CERN

• computers are everywhere

• embedded microprocessors

• 2,000 personal computers

• 1,400 scientific workstations

• RISC clusters, even mainframes

• estimate 40 MSFR per year (+ staff)

Page 4: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-4

CERN

Central Computing Services

• 6,000 users

• Physics data processing traditionally:

mainframes + batch

emphasis on:

reliability, utilisation level

• Tapes:300,000 active volumes22,000 tape mounts per week

Page 5: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-5

CERN

Application Characteristics

• inherent coarse grain parallelism (at event or job level)

• Fortran

• modest floating point content

• high data volumes

– disks

– tapes, tape robots

• moderate, but respectable, data rates -a few MB/sec per fast RISC cpu

Obvious candidate for RISC clusters

A major challenge

Page 6: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-6

CERN

CORE - Centrally Operated Risc Environment

• Single management domain

• Services configured for specific applications, groups

but common system management

• Focus on data -external access to tape and disk

servicesfrom CERN network,or even outside CERN

Page 7: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

Home directories& registry

CERN Network

CSF

Simulation Facility

PIAF - InteractiveAnalysis Facility

SPARCstations

Central Data Services

Shared Disk Servers

consoles&

monitors

CORE Physics Services

CERN

SHIFTData intensive services

7 IBM, SUNservers

Scalable Parallel Processors

25 H-P 9000-735 H-P 9000-750

25 H-P 9000-735 H-P 9000-750

5 H-P 9000-755100 GB RAID disk

5 H-P 9000-755100 GB RAID disk

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes

260 GBytes6 SGI, DEC, IBM servers

260 GBytes6 SGI, DEC, IBM servers

3 tape robots21 tape drives6 EXABYTEs

3 tape robots21 tape drives6 EXABYTEs

SPARCserversBaydel RAID disks

tape juke box

SPARCserversBaydel RAID disks

tape juke box

les robertson /cn

Shared Tape Servers

equipment installed or on order Jamuary 1994

Page 8: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

CERN Network

CSF

Simulation Facility

PIAF - InteractiveAnalysis Facility

SPARCstations

Home directories& registry

Central Data Services

Shared Disk Servers

consoles&

monitors

CORE Physics Services

CERN

SHIFTData intensive services

7 IBM, SUNservers

Scalable Parallel Processors

25 H-P 9000-735 H-P 9000-750

25 H-P 9000-735 H-P 9000-750

5 H-P 9000-755100 GB RAID disk

5 H-P 9000-755100 GB RAID disk

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes260 GBytes6 SGI, DEC, IBM servers

260 GBytes6 SGI, DEC, IBM servers

3 tape robots21 tape drives6 EXABYTEs

3 tape robots21 tape drives6 EXABYTEs

SPARCserversBaydel RAID disks

tape juke box

SPARCserversBaydel RAID disks

tape juke box

les robertson /cn

Shared Tape Servers

equipment installed or on order Jamuary 1994

Page 9: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

CERN Network

CSF

Simulation Facility

PIAF - InteractiveAnalysis Facility

SPARCstations

Home directories& registry

Central Data Services

Shared Disk Servers

consoles&

monitors

CORE Physics Services

CERN

SHIFTData intensive services

7 IBM, SUNservers

Scalable Parallel Processors

25 H-P 9000-735 H-P 9000-750

25 H-P 9000-735 H-P 9000-750

5 H-P 9000-755100 GB RAID disk

5 H-P 9000-755100 GB RAID disk

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes260 GBytes6 SGI, DEC, IBM servers

260 GBytes6 SGI, DEC, IBM servers

3 tape robots21 tape drives6 EXABYTEs

3 tape robots21 tape drives6 EXABYTEs

SPARCserversBaydel RAID disks

tape juke box

SPARCserversBaydel RAID disks

tape juke box

les robertson /cn

Shared Tape Servers

equipment installed or on order Jamuary 1994

Page 10: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

CERN Network

CSF

Simulation Facility

PIAF - InteractiveAnalysis Facility

SPARCstations

Home directories& registry

Central Data Services

Shared Disk Servers

consoles&

monitors

CORE Physics Services

CERN

SHIFTData intensive services

7 IBM, SUNservers

Scalable Parallel Processors

25 H-P 9000-735 H-P 9000-750

25 H-P 9000-735 H-P 9000-750

5 H-P 9000-755100 GB RAID disk

5 H-P 9000-755100 GB RAID disk

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes260 GBytes6 SGI, DEC, IBM servers

260 GBytes6 SGI, DEC, IBM servers

3 tape robots21 tape drives6 EXABYTEs

3 tape robots21 tape drives6 EXABYTEs

SPARCserversBaydel RAID disks

tape juke box

SPARCserversBaydel RAID disks

tape juke box

les robertson /cn

Shared Tape Servers

equipment installed or on order Jamuary 1994

Page 11: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

CERN Network

CSF

Simulation Facility

PIAF - InteractiveAnalysis Facility

SPARCstations

Home directories& registry

Central Data Services

Shared Disk Servers

consoles&

monitors

CORE Physics Services

CERN

SHIFTData intensive services

7 IBM, SUNservers

Scalable Parallel Processors

25 H-P 9000-735 H-P 9000-750

25 H-P 9000-735 H-P 9000-750

5 H-P 9000-755100 GB RAID disk

5 H-P 9000-755100 GB RAID disk

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

8 node SPARCcenter32 node Meiko CS-2

(Early 1994)

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes

Processors: 24 SGI; 11 DEC Alpha;9 H-P; 2 SUN; 1 IBM

Embedded disk: 1.1 TeraBytes260 GBytes6 SGI, DEC, IBM servers

260 GBytes6 SGI, DEC, IBM servers

3 tape robots21 tape drives6 EXABYTEs

3 tape robots21 tape drives6 EXABYTEs

SPARCserversBaydel RAID disks

tape juke box

SPARCserversBaydel RAID disks

tape juke box

les robertson /cn

Shared Tape Servers

equipment installed or on order Jamuary 1994

Page 12: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-12

CERN

CSF - Central Simulation Facility

• second generation, joint project with H-P

interactive hostjob queues shared,

load balanced H-P 750

tape servers

ethernet

FDDI

• 25 H-P 735s - 48 MB memory, 400MB disk• one job per processor• generates data on local disk• staged out to tape at end of job• long jobs (4 to 48 hours)• very high cpu utilisation : >97%• very reliable : > 1 month MTBI

Page 13: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-13

CERN

SHIFTScalable, Heterogeneous, Integrated, Facility

• Designed in 1990

• fast access to large amounts of disk data

• good tape support

• cheap & easy to expand

• vendor independent

• mainframe quality

• First implementation in production within 6 months

Page 14: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-14

CERN

Design choices• Unix + TCP/IP

• system-wide batch job queues

“single system image”

target Cray style & service quality

• pseudo distributed file systemassumes no read/write file sharing

• distributed tape staging model (disk cache of tape files)

– the tape access primitives are

copy disk file to tape

copy tape file to disk

Page 15: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-15

CERN

IP network

The Software Model

diskservers

cpuservers

stageservers

tapeservers

queueservers

Define functional interfaces ---- scalable heterogeneous distributed

Page 16: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-16

CERN

• Unix Tape Subsystem• (multi-user, labels, multi-file, operation)

• Fast Remote File Access System

• Remote Tape Copy System

• Disk Pool Manager

• Tape Stager

• Clustered NQS batch system

• Integration with standard I/O packages• FATMEN, RZ, FZ, EPIO, ..

• Network Operation

• Monitoring

Basic Software

Page 17: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-17

CERN

Unix Tape Control

• tape daemon

– operator interface / robot interface

– tape unit allocation / deallocation

– label checking, writing

Page 18: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-18

CERN

Remote Tape Copy System

• selects a suitable tape server

• initiates the tape-disk copy

tpread -v CUT322 -g SMCF -q 4,6 pathname

tpwrite -v IX2857 -q 3-5 file 3 file4 file5

tpread -v UX3465 `sfget -p opaldst file34`

Page 19: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-19

CERN

Remote File Access System - RFIO

high performance, reliability (improve on NFS)

• C I/O compatibility library

Fortran subroutine interface

• rfio daemon started by open on remote machine

• optimised for specific networks

• asynchronous operation (read ahead)

• optional vector pre-seek– ordered list of the records which will probably be read next

Page 20: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-20

CERN

sgi1 dec24

sun5disk pool

a disk pool is a collection of Unix file systems, possibly on several nodes, viewed as a single chunk of allocatable space

Page 21: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-21

CERN

Disk Pool Management

• allocation of files to pools– pools can be public or private

• and filesystems– capacity management

• name server

• garbage collection– pools can be temporary or permanent

• example:

• sfget -p opaldst file26

• may create file like:

• /shift/shd01/data6/ws/panzer/file26

Page 22: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-22

CERN

• implements a disk cache of magnetic tape files

• integrates: Remote Tape Copy System& Disk Pool Management

• queues concurrent requests for same tape file

• provides full error recovery -restage &/or operator control on

hardware/system errorinitiate garbage collection if disk full

• supports disk pools & single (private) file systems

• available from any workstation

Tape Stager

Page 23: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-23

CERN

Tape Stager

tape serverrtcopy tape, file

disk server

stage controlsfget file

tpread tape, file

cpu server(user job)

stagein tape, file

RFIO

independent stagecontrol for each

disk pool

Page 24: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-24

CERN

SHIFT Statusequipment installed or on order January 1994

configuration -- capacity --

cpu(CU*) disk(GB)OPAL SGI Challenge 4-cpu + 8-cpu (R4400 - 150 MHz) 290 590 Two SGI

340S 4-cpu (R3000 - 33MHz)

ALEPH SGI Challenge 4-cpu (R4400 - 150MHz) 216 200

Eight DEC 9000-400

DELPHI Two H-P 9000/735 52 200

L3 SGI Challenge 4-cpu (R4400 - 150MHz) 80 300

ATLAS H-P 9000/755 26 23

CMS H-P 9000/735 26 23

SMC SUN SPARCserver10, 4/630 22 4

CPLEAR DEC 3000-300AXP, 500AXP 29 10

CHORUS IBM RS/6000-370 15 15

NOMAD DEC 3000-500 AXP 19 15

Totals 775 1380

* CERN-Units:one CU equals approx. 4 SPECints (CERN IBM mainframe 120 600)

CERNgroup

Page 25: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-25

CERN

Current SHIFT Usage

• 60% cpu utilisation

• 9,000 tape mounts per week, 15% writestill some way from holding the active data on disk

• MTBI - cpu and disk servers400 hours for an individual server

• MTBF for disks: 160K hours

maturing service, but does not yet surpass the quality of the mainframe

Page 26: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-26

CERN

UltraNet

1 Gbps backbone

6 MBytes/secsustained

SHIFT cpuservers

SHIFT diskservers

IBM mainframe

FDDI + GigaSwitch - 2-3 MBytes sustained

SHIFT tapeservers

Ethernet + Fibronics hubs - aggregate 2 MBytes/sec sustained

Simulationservice

Homedirectories

connection to CERN & external networks

CORE Networking

Page 27: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-27

CERN

FDDI Performance(September 1993)

100 MByte disk file read/written sequentially using 32KB records

client: H-P 735 server: SGI Crimson, SEAGATE Wren 9 disk

system read write

NFS 1.6 MB/sec 300 KB/sec

RFIO 2.7 MB/sec 1.7 MB/sec

Page 28: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-28

CERN

PIAF - Parallel Interactive Data Analysis Facility

(R.Brun, A.Nathaniel, F.Rademakers CERN)

• the data is “spread” across the interactive server cluster

• the user formulates a transaction on his personal workstation

• the transaction is executed simultaneously on all servers

• the partial results are combined and returned to the user’s workstation

Page 29: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-29

CERN

PIAFworker

PIAF Architecture

PIAFclient

displaymanager

PIAF server

PIAFworker

PIAFworker

PIAFworker

PIAFworker

userpersonal

workstation

PIAF Service

Page 30: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-30

CERN

Scalable Parallel Processors

• embarrassingly parallel application -therefore in competition with workstation clusters

• SMPs and SPPs should do a better job for SHIFT than loosely coupled clusters

• computing requirements will increase by three orders of magnitude over next ten years

• R&D project started, funded by ESPRIT - GPMIMD232 processor Meiko CS-225 man-years development

Page 31: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-31

CERN

Conclusion

• Workstation clusters have replaced mainframes at CERN for physics data processing

• For the first time, we see computing budgets come within reach of the requirements

• Very large, distributed & scalable disk and tape configurations can be supported

• Mixed manufacturer environments work, and allow smooth expansion of the configuration

• Network performance is the biggest weakness in scalability

• Requires a different operational style & organisation from mainframe services

Page 32: Analyse de Physique sur machines RISC : expériences au CERN SACLAY 20 JUIN 1994

cn - fhe - jun 94-32

CERN

Operating RISC machines

• SMP’s easier to manage

• SMP’s requires less manpower

• Distributed management not yet robust

• Network is THE problem

• Much easier than mainframes, and

• ... cost effective