Post on 13-Jan-2016
description
Grilles informatiques en Europe, des sciences de la vie à la santé
V. Breton
Journée Génopôle
IRISA
Le concept d’infrastructure de grilles (1/2)
Internet met à disposition des informations… L’utilisateur doit tout faire lui-même
Mettre en forme les informations à partager (site web) Identifier, trier, analyser les données disponibles Limites : compétence, stockage, puissance de calcul
Evolution : sites web offrant des services spécialisés Limitations : ressources du site (compétence, CPU,
stockage) Notion d’infrastructure de grille : permettre à des
communautés d’utilisateurs de partager des ressources de calcul et de stockage et des services Mutualisation des compétences, du calcul et du stockage Traitement de l’information, Sécurité
Le Concept (2/2)
Une infrastructure de grille informatique permet à des communautés d’intérêt de partager de façon dynamique des ressources informatiques distantes et distribuées géographiquement pour le stockage de gros volumes de données et pour accroître les puissances de calcul
Une infrastructure de grille informatique comprend un ensemble hétérogène de calculateurs, de moyens de stockage, voire d ’instruments de mesure reliés entre eux par un réseau à haut débit et grâce à un middleware. Elle offre aux utilisateurs un accès aisé, transparent et sûr à l’ensemble de ces ressources hétérogènes.
Grid technology is promisingfor both computing intensive applications and
knowledge discovery
To connect databases of heterogeneous content (biology and medicine) enabling new knowledge discovery (research, drug design), better guidance and information (healthcare professionals)
To increase computing power for analysis, imaging, simulation and modelling thus allowing these fields to take into account more data and therefore to provide more accurate results.
To address security (integrity, confidentiality, authentication, authorization, non-repudiation, availability)
The challenges of a life science grid
Technical challenges data and tools integration : address data
heterogeneity and legacy of tools and standards provide the infrastructure to deploy biomedical
applications in a grid environment Human challenge : involve end users in the grid game
Grids are still very much in development and therefore user-unfriendly
Training and support to university hospitals, biology/medecine research centres
Projets de grille en bioinformatique
Projets nationaux en Europe France : GenoGRID, GRIPPS, Rugbi UK e-science : Mygrid Hollande : BioASP …
Projets américains : Encyclopedia of Life (EOL) North Carolina Biogrid project http://www.ncbiogrid.org/ …
Projets en Asie : Japon : OBIGrid, http://www.obigrid.org
Projets européens DataGrid (FP5) EGEE (FP6) Embrace (soumis en Novembre 2003) …
Phylojava, web portal for phylogeneticson DataGrid
0
500
1000
1500
2000
2500
0 100 200 300 400 500
bootstrap number
time
(min
utes
)
Datagrid EDG 1.4
SUN 900MHz
Bootstrapping : procedure to compute a consensus from a large number of independent phylogenetic tree calculations
Crédit : T. Silvestre, BBE Lyonhttp://pbil.univ-lyon1.fr/phylojava
Exemple de prise en charge de 450 jobs sur DataGrid
152
103
154
205
256
307
358
409
460
511
562
613
664
0
20
40
60
80
100
I taly
Netherlands
United Kingdom
Total
Temps en minutes
Nombre de jobs
Crédit : T. Silvestre, BBE Lyon
The Encyclopedia of Life (EOL)http://eol.sdsc.edu/
Collaborative global project designed to catalog the complete proteome of every living species in a flexible reference system.
Open collaboration led by the San Diego Supercomputer Center
Three major development areas: Creating protein sequence annotations using the
integrated genome annotation pipeline (iGAP). Storage of these annotations in a data warehouse
where they are integrated with other data sources A toolkit area that presents the data to users in the
presence of useful annotation and visualization tools.
Mygrid
myGrid offers service based middleware components Open source and free Open Grid Service Architecture-compliant
Allows the scientist to be at the centre of the Grid -- Personalisation Generic middleware that suits the creation of bioinformatics
applications Inclusion of rich semantics to facilitate the scientific process
42 months, 20 months in. Available from http://www.mygrid.org.uk
Prototype V0 technical and user requirements Prototype V1 Release Sept 2004, some services available now.
Les futurs projets en Europe
EGEE : infrastructure de production pour la recherche Suite de DataGrid 70 partenaires autour du CERN 32 Millions d’Euros Démarrage en Avril 2004 Domaines applicatifs privilégiés : Physique des particules et
biomédical Embrace : proposition de réseau d’excellence
17 partenaires autour d’EBI Developper les API pour intégrer les données biologiques et les
outils bioinformatiques dans un environnement de grille Soumis au 2ème appel (Nov. 2003)
Health + Grid = HealthGrid
Health:
All levels of data & information, from molecule to population needed to ensure better prevention, diagnosis
and treatment of the citizen.
Grid:
An environment, created through the sharing of resources, in which heterogeneous and dispersed data as well as applications can be accessed by different partners
according to their authorisation, without loss of information.
Draft Ideas, September 2002
INDIVIDUALISED HEALTHCAREMOLECULAR MEDECINE
Databases
Association
Modelling
Computation
HealthGRID
Computational recommandation
Public Health
Patient
Tissue, organ
Cell
MoleculePatient related data
Public Health
Patient
Tissue, organ
Cell
Molecule
S. NøragerY. PaindaveineDG-INFSO
A recent example of the potential grid impact
Last summer heat wave killed more than 10000 people in France
Mortality rate in excess of 10 to 50% in retirement homes and hospitals unnoticed for 2 weeks
A monitoring system could have raised the alarm much earlier Requirements : collect information from hospitals
and/or funeral services on the number of casualties Internet can do it through a centralized web portal Grid added value : database federation (data left in
hospitals, à la BIRN) + a grid service for mortality rate computation and monitoring
Grid technology allows to do it today…
UI
- PKI X.509 certificate keys- JDL files
Ordinateurs du médecin
enterGrid
enterGrid
enterGrid
enterGrid
UI WN
WN
WN
WN
WN
WNRB/II
CESESE
Machines de calcul
Machines de stockage de
données (images
radiologiques)
Conception d’une grille
Allow every physician to access a reliable grid for his daily practice
New actors : hospitals, physicians, healthcare administrations, big pharmas, SMEs
Technical issues Networking, User interfaceGrid quality of services (stability, scalability, security,…)
Legal/ethical issues : obey the laws of the European countries with respect to
personal data ownership and data transfer
Grid technology is not ready yet to address all these challenges, butIt is time to build bridges towards this vision
The challenges of an healthcare grid
In silico drug discovery
Goal : speed up the cycle for drug discovery Challenge : bridge gaps in the translation of basic
research through to drug development from the public to the private sector and in the feedback from the private sector of their results
The grid impact : high performance computing and data storage for
massive docking Collaborative environment for searching new targets
and sharing results while respecting privacy Short term perspective : a grid for neglected disease
Non profit drug discovery in a grid environment Technical issues : security, data management
Multi-site therapy monitoring
Goal : reduce time and cost to launch a drug on the market (100 million euros and 10 years)
Challenge : improve monitoring of multi-site clinical trials The grid impact : moving away from a single centralized
repository Technical issues : security, data management
Intensity Modulated Radiation Therapy
Goal : deliver a variable fluence (number of particles per unit square) using complex geometries adapted to the tumoral volume depending on the beam incidence
Challenge : necessity to simulate treatment through inverse dosimetry for each incidence of the beam and geometry of the multi-lames collimator and validate the dose delivered to the patient 30 beams x 2 minutes = 1 hour for each iteration of
treatment validation The grid impact : parallel execution of the different beam
configurations on a cluster Reduce time needed for treatment planning and increase
number of patients Technical issues : security, quality of service
Perform a trial for the introduction of the Grid
approach in the biotechnology industry
Biomolecular simulations
Some health related FP5 grid projects in Europe
Simulation/Imaging Software
Grid Software /solutions
Bio-numeric modelling
Medical Expertise
Legal Aspects
Medical simulation service+ networked compute resources
Pre- & Post-processing
User-site(SW installed)
GRID SW
GRID SW (interface)
Applications SW
GRID SW (service use)
Internetor Intranet
Internetor Intranet
Could also be movedto the services portal
Simulation Service System
Project Duration: 30 months, Commencement: 1.9.2002
http://www.gemss.de
GEMSS: GRID-enabled Medical Simulation Services
GEMSS - main goals
Main GEMSS Goals: Secure and lawful Grid provision of medical simulation
services, Build 6 Grid-enabled medical prototype applications, Build suitable middleware on top of common standards, Install and evaluate a GEMSS test-bed, Anticipate privacy, security and other legal concerns related
to providing medical services over the Internet.
Necessary Assumption:No special purpose network infrastructure
Logger
ComputeResource
VNC
DataStorage
QoS
App
licat
ion
Err
orre
cove
ry
WS
DL
SO
AP
mar
shal
ling
Web
Ser
vice
Web
Ser
vice
Sec
urit
y
Sec
ure
Tra
nsfe
r (h
ttps
)
Apache
ConversationalAuthorisation
Keystore +Certificates
Web
Ser
vice
Sec
urit
y
Sec
ure
Tra
nsfe
r (h
ttps
)
Keystore +Certificates
Bridge tolocal PKI ?
Invo
cati
onLa
ngua
geA
PI
LoggerN
ego
tiat
ion
App
licat
ion
Wor
kflow
Bus
ines
sPr
oces
ses
LocalAuthorisation
RegistryCertification
UI
Ele
men
ts
Internet
Client ServerAppropriate User Interfaces& Applications Workflow
• Workflow Enactor
• Negotiation• Business
Processes
• Secure Transfer, Web Services Security, Logging
Negotiated Service Provision
GEMSS -Technical Goals & Challenges
GEMSS - outlook
Status of Work: GEMSS has finalised its design phase:
client-server arch. based on web services (OGSA-compliant).
Outlook: prototype system – Feb. 2004final GEMSS system – Aug. 2004
Contribution to Standardisation:GEMSS is assessing its involvement in GGF, IETF or W3C. Final Strategy has yet to be decided.
La téléradiologie aujourd’hui: une solution à
améliorer« Il ne suffit pas qu’un système de téléradiologie soit techniquement performant ni légalement installé pour garantir son succés pratique »
Franken et coll.Les difficultés mises à nu:
Délai constaté pour une interprétation d’image radiologique trop long (24 à 96h)
Les comptes rendus de téléradiologie mal adaptés
Nombreux problèmes de communication orale ou écrite, en particulier sur:
La qualité des images
Les renseignements cliniques
Expérience de téléradiologie entre 1992 et 1995 entre un hôpital rural de l’Arkansas et des radiologues universitaires de Iowa City (USA)
eDiamond Digital Mammogram National Database
• Fédérer des bases de données de mammographies• Aider au programme de détection du cancer du sein au Royaume Uni
• Buts: Outil d’apprentissage pour les radiologues: e-learning
Support au télédiagnostic
Outils pour l’aide au data mining et à l’épidémiologie
Outils pour contrôle de qualité automatisé
1.5M - examens en 2001-0265,000 – Rappelées pour 2ème contrôle8,545 – Cancers détectés300 – vies sauvées par an
230 – Radiologues “Double Lecture”
Film
Papier
230 - Radiologues “Double Lecture”50% - Croissance examens
2,000,000 – examens chaque année120,000 – Rappelées pour 2ème contrôle10,000 – Cancers détéctés1,250 – vies sauvées par an
Digital
Digital
Aujourd’hui Demain
Le dépistage du cancer du sein au Royaume Uni
Metadata Images
Logical View is One Resource
Grille
PatientPatient AgeAge …… ImageImage107258107258 5555 …… 1.dcm1.dcm236008236008 6262 …… 2.dcm2.dcm700266700266 5959 …… 3.dcm3.dcm895301895301 5858 …… 4.dcm4.dcm……………… …… …… ……..……..……………… …… …… ……..……..……………… …… …… ……..……..……………… …… …… ……..……..……………… …… …… ……..……..……………… …… …… ……..……..……………… …… …… ……..……..……………… …… …… ……..……..
DonnéesDonnéesDICOMDICOM
DICOMDICOM
DICOMDICOM
DICOMDICOM
CalculCalcul
StandardMammoFormat
StandardMammoFormat
DataMining
DataMining
CADeCADi
CADeCADi
92 centres de dépistage du
cancer du sein
Challenge: La normalisation des imagesDe nombreux paramètres influence l’apparence des images
Distribution de densité des tissus, tumeurs, microcalcifications
Voltage, temps d’exposition……
Solution : SMF Standard Mammogram Form
Le principe
Haut gauche: Image cranio-caudale(plus contour du sein et marques
Haut droit: Image médio-latérale oblique
Bas: Galerie d’images disponibles
Haut gauche: reconstruction 3D montrant la localisation de la tumeur
Bas gauche: SMF vue des différentes densités de surface
Droite :image normalisée
MammoGrid – European federated mammogram database
implemented on a GRID infrastructure
Main goals: Epidemiology of breast cancer from a European perspective
Open source architecture Use of Grid in developing quality control techniques for breast
cancer screening Development of some CADe techniques
http://lotus5.vitamib.com/hnb/mammogrid/mammogrid.nsf/Web/Frame?openform
University Database
Healthcare Institute
Hospital Italy
Hospital UK Shared meta-data
Analysis-specific data
•Knowledge is stored alongside data•Active (meta-)objects manage various versions of data and algorithms•Small network bandwidth required
Clinician’s Workstations
QueryResult
LocalQuery
LocalAnalysis
LocalAnalysis
LocalAnalysisLocal
Analysis
Massively distributed dataAND distributed analyses
GRIDLocalQuery
LocalQuery
LocalQuery
MammoGrid -Federated System Solution
MammoGrid -Grid challenges: database
Large federated databases Images and metadata
Ontologies and metadata Image formation parameters Image features Clinical information Demographic data
Effective data mining of a rapidly growing database Allow for complex queries involving executables Medical image analysis clients are not Grid experts!
MammoGridGrid challenges: communications
Legal restrictions on access to data Clinicians, researchers, developers, Govt, …
Data resides in hospitals Firewall protected
Combining several databases Secure file transfer Large images to be transferred Develop API for black box third party applications
Grids for medical development
Preparation and follow-up of medical missions in developing countries
Support to local medical centres in terms of second diagnosis, patient follow-up and e-learning
2 missions (Ibagué & Chuxiong) with the french NPO « Chaîne de l’Espoir » used as test cases
IbaguéHand surgery Medical centre
Clermont-Ferrand/Paris
Chuxiong
The grid impact :
•Improved telemedecine services
• Federation of patient databases
•Interactive e-learning (high bandwidth network required)
Request for
second diagnosis
Patient d
ata
consultatio
n
Second diagnostic
Patient follow-up
Patient dataRequest for
2nd diagnostic
Interactiv
e e-learn
ing
Video-conferences
DataGrid : status of biomedical applications
Bio-informatics Phylogenetics : BBE Lyon (T. Sylvestre) Search for primers : Centrale Paris (K. Kurata) Bio-informatics web portal : IBCP (C. Blanchet) Parasitology : LBP Clermont, Univ B. Pascal (N. Jacq) GRID platform for DNA microarray data analysis : Karolinska (R.
Martinez) Geometrical protein comparison : Univ. Padova (C. Ferrari)
Medical imaging MR image simulation : CREATIS (H. Benoit-Cattin) Medical data and metadata management : CREATIS (J. Montagnat) Mammographies analysis ERIC/Lyon 2 (S. Miguet, T. Tweed) Simulation platform for PET/SPECT based on Geant4 : GATE
collaboration (L. Maigne)
deployedtested on EDGunder preparation
Simulation Monte-Carlo sur grille
Credit : D. Hill L. Maigne R. Reuillot
Objectif : accélérer l’exécution de codes Monte-Carlo Méthode : étudier l’impact du déploiement sur grille de calculs Monte-Carlo Parallélisation étudiée : soumission de tâches avec des graines indépendantes
GATE , plate-forme de simulation Pour l’imagerie médicale nucléaire et la curie/radiothérapie
Impact du déploiement sur le temps de calcul
Credit : D. Hill L. Maigne R. Reuillot
Variation du temps de calcul en fonction du nombre de tâches soumises en parallèle
Variation du temps de calcul en fonction du jour du mois pour 100 tâches soumises en parallèle.
Simulated vascular reconstruction
Goals Supports the vascular surgeon in
placement of bypasses and stents Predicts blood flow before operation
Geometry obtained from medical scans
Add the proposed interventionMethod Interactive Virtual Reality Environment to
View scanned data Define proposed interventions View simulation results
Advanced fluid code to simulate flows Grid for data access and computational
resources
FP6 : the opportunities of a new paradigm
From pilot to production grid infrastructures (EGEE,…) committed to provide to users communities Training User support Access to resources
Need for collaborations with NoE and grid projects in the eHealth area to deploy large scale applications
Feedback eHealth specific requirements to middleware developers
Research infrastructures and testbeds
eHealth
Grids for complexProblem solving
eHealth
To widen the impact of the healthgrid cluster, the Healthgrid association
To disseminate information on grids for health Summaries and links to health related grid projects Available tools (software platforms, middleware,…) Tutorials Conferences
To foster exchange between projects, end users and technology developers To avoid reinventing the wheel To improve the take-up of grid technology
To promote standards Involvement in GGF Life Science Research group
Open to any new member Contact point : Y. Legrè (legre@clermont.in2p3.fr) Web site : http://www.healthgrid.org
eHealth
Healthgrid conferences
Jointly organised by CERN, CNRS and EMBnet in collaboration with the eHealth unit DG-INFSO
Meeting point for actors of grids for health End users = healthcare professionals / providers +
academic & industrial researchers and developers from bio-informatics and medical-informatics
Grid applications developers Technology developers
First conference in Lyon (January 2003) Next conference in Clermont-Ferrand (January 29-30
2004)
eHealth
HealthGrid 2004
January 29th - 30th 2004, Clermont-Ferrand, France
http://clermont2004.healthgrid.org
The aims of this conference are to reinforce and promote awareness of the possibilities and advantages linked to the deployment of GRID technologies in health. In this context "Health" does not involve only clinical practice but covers the whole range of information from molecular level (genetic and proteomic information) through cells and tissues, to the individual and finally the population level (social healthcare).
eHealth
Conclusion
Des grilles pilotes (FP5) aux grilles d’exploitation pour la recherche (FP6)
Grilles en bioinformatique Premiers portails prototype utilisant des grilles pour le
calcul distribué Projets de plate-forme pour déployer des expériences A faire : gestion des données hétérogènes distribuées (->
Embrace) Grilles pour la santé
Projets pilotes au niveau national et européen Initiative Healthgrid pour créer une communauté
( informaticiens, utilisateurs de grille, acteurs du monde de la santé)
goal : provide a GRID platform for DNA microarray data analysis and Gene Regulation Bioinformatics that permit predictions of involvement of genes in the pathogenesis of human diseases.