Calcul CMS: bilan CCRC08

20
Calcul CMS: bilan CCRC08 C. Charlot / LLR LCGFR, 3 mars 2008

description

Calcul CMS: bilan CCRC08. C. Charlot / LLR LCGFR , 3 mars 2008. CCRC08: objectifs. Test de readiness de l’infrastructure de calcul avant le data taking Exercice combin é avec les autres expériences Phase I: fév. 2008 S érie de tests fonctionnels - PowerPoint PPT Presentation

Transcript of Calcul CMS: bilan CCRC08

Calcul CMS: bilan CCRC08

C. Charlot / LLR

LCGFR, 3 mars 2008

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: objectifs

Test de readiness de l’infrastructure de calcul avant le data taking Exercice combiné avec les autres expériences

Phase I: fév. 2008 Série de tests fonctionnels Processing au T0 et archivage, transferts Cessy->CERN, transferts T0-

>T1->T2, T1 staging et processing, tests CAF Phase II: mai 2008

Workflow complet et simultané à tous les sites Echelle = 100% 1 semaine de mise en route puis 4 semaines de test

Cette présentation Tests de transferts Test de staging au T1 Test de processing simultané avec ATLAS au T1 (PIC, CC-IN2P3)

Réunion LCG-France, 03/03/2008 C.Charlot

Objectifs: Performances

T0 (stagged data) -> T1 (disk buffer): minimum = 25% de 2008, objectif = 40% de 2008, optimal = 50% de 2008

T1 (disk) -> T1 (bandes): 25% de 2008 Objectif doit être atteint pendant 3 jours de suite

Stabilité T0 (stagged) -> T1 (disk) -> T1 (bandes) Tranfert stable avec réception d’un volume équivalent à 3 jours au débit ci-

dessus (10TB pour CC-IN2P3)

CCRC08: T0->T1 transfers

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T0->T1 transfers T0-T1

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T0->T1 transfers T0-T1-CCIN2P3

- Problems with srmv2 config- Problems with dcache

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T0->T1 transfers T0-T1s

Réunion LCG-France, 03/03/2008 C.Charlot

Objectifs: Performances

Débit aggrégé: exporter à 50% du débit 2008 vers au moins 3 T1s Débit aggrégé: importer à 50% du débit 2008 depuis au moins 3 T1s

– Au moins 1 T1 d’un autre continent

CCRC08: T1->T1 transfers

Réunion LCG-France, 03/03/2008 C.Charlot

Résumé des 3 semaines

CCRC08: T1->T1 transfers

Réunion LCG-France, 03/03/2008 C.Charlot

T1-CCIN2P3->otherT1s

CCRC08: T1->T1 transfers

Réunion LCG-France, 03/03/2008 C.Charlot

T1<->T1 résumé

CCRC08: T1->T1 transfers

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T1->T2 transfers

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T1-CCIN2P3->T2s

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T1s->region T2s

Réunion LCG-France, 03/03/2008 C.Charlot

CCRC08: T1s->region T2s

Réunion LCG-France, 03/03/2008 C.Charlot

• Reprocessing tests for CCRC08 in February :Reprocessing tests for CCRC08 in February :

A) Migration from Tape to Buffer: pre-stage test. B) Reprocessing exercise: use all available CMS slots at T1s.

Not done since already achieved at T1 CC-IN2P3 with ~1000 slots used processing of production data

C) Reprocessing exercise: test ATLAS and CMS reprocessing jobs on same WN

CCRC08: reprocessing tests

Réunion LCG-France, 03/03/2008 C.Charlot

• Goal:Goal: Measure latency, throughput and success rate for Tape to Buffer staging, for files which are only kept on Tape (not on disk).

Plan:Plan:+ select one (or more) dataset(s) of 10TB size existing at T1.+ remove all the files from disk (aka, T1 Buffer).+ fire the staging from Tape to Disk of all files.+ measure some variables (detailed in the twiki).

• Schedule:Schedule: To be done at sites (with help of site admins) during the 1st quarter of February. Done at all T1 sites.

CCRC08: pre-staging tests

Réunion LCG-France, 03/03/2008 C.Charlot

• Obtained Results: Obtained Results:

Staging time for 10 TBs: ~24h (except RAL and IN2P3,CNAF)Staging time for 10 TBs: ~24h (except RAL and IN2P3,CNAF)

CCRC08: pre-staging tests

Réunion LCG-France, 03/03/2008 C.Charlot

• dCache HPSS interface: HPSS -> HPSS_Disk ->dCache_Disk (Farm access).

• 1 GB file needs ~140’’ to complete process HPSS_TapeHPSS_DiskdCache_Disk. The latest (HPSS_DiskdCache_Disk) is achieved in ~45 secs (22 MB/s), while HPSS_TapeHPSS_Disk takes the majority of time, as expected (mounts, tape seek…).

• 140’’ for file staging 7.1 MB/s for file recovery, in average, per drive.

• The test launched 3 parallel processes for staging -> 3 tapes (max.) were mounted at any time to recover files from the system. 7.1 MB/s/drive was achieved 23 MB/s, averaged

• A last test consisting on recalling 100 files in a same tape has been performed. HPSS_TapedCache_Disk took 19' 12secs/file 88 MB/s. x10 better.

CCIN2P3: pre-staging tests

Réunion LCG-France, 03/03/2008 C.Charlot

Goal: run ATLAS and CMS reprocessing jobs on same WNs Investigate performances, memory issues Setup new CE with updated middleware and dedicated queues

Results: ATLAS and CMS jobs were ran on dedicated CE+WNs

10 8-core worker nodes It allowed grid people to discover tricks in the LCG-CE glite-3.1 Discovered that at CC tthe jobs were submitted to all queues and that

GlueCEStateStatus == "Production" was not taken into account. Max memory requirement was relaxed to allow for memory study but this

was not looked at Too limited # of WNs to see any interference effects

ATLAS+CMS processing test

Réunion LCG-France, 03/03/2008 C.Charlot

Conclusions

Bonne participation du CC aux tests CCRC08 Merci à tous pour les efforts

Test de re-processing se sont avérés utiles Staging: débit limité par l’interface dCache->HPSS

Optimisation de la gestion des requêtes ou ordonancement des fichiers par bande par l’utilisateur à prévoir

Des difficultés pour les transferts Problème de configuration srmv2, nombreux problèmes dCache Objectif CCRC08 (mai) de 100MB/s depuis le CERN Il parait urgent de stabiliser dCache