Serveur terminologique inter-lingue : vers des ... · Microbiologie - Pôle Biologie) Unité de...
Transcript of Serveur terminologique inter-lingue : vers des ... · Microbiologie - Pôle Biologie) Unité de...
Serveur terminologique inter-lingue
vers des applications pratiques
Stefan J Darmoni MD PhD
TIBS LITIS Lab
Rouen University Hospital amp Rouen Medical School Normandy University France
LIMICS INSERM U1142
Email StefanDarmonichu-rouenfr
2016 ndash SIBM - Rouen University Hospital
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
Pocircle de Santeacute Publique JF Gehanno V Mangot MH Roux
Enseignement et recherche (sous-
section CNU 46-02)
Enseignement et recherche (sous-section CNU 46-04)
Enseignement et recherche (sous-section CNU 46-01)
JF Gehanno
SJ Darmoni
J Benichou
J Ladner
I Mareacutechal
V Merle
Deacutepartement de santeacute au travail et
pathologie professionnelle
(DSTPF)
Deacutepartement drsquoinformation et drsquoinformatique
meacutedicales (DIIM)
Deacutepartement drsquoappui agrave la
recherche clinique (DARC)
Deacutepartement de santeacute publique et
eacutepideacutemiologie (DSPE)
Deacutepartement
drsquoappui agrave la qualiteacute et agrave la seacutecuriteacute des
soins (DAQSS)
Deacutepartement
drsquohygiegravene hospitaliegravere (DHH)
Uniteacute de santeacute au travail des personnels hospitaliers
(USTPH)
Uniteacute drsquoinformation meacutedicale (UIM 1)
Centre drsquoinvestigation clinique (CIC)
Uniteacute de santeacute
publique et drsquoeacutepideacutemiologie clinique (USPEC)
Uniteacute de preacutevention des risques associeacutes aux soins (UPRAS)
Uniteacute de preacutevention des infections
nosocomiales (UPIN)
Uniteacute de pathologie professionnelle
(UPF)
Uniteacute drsquoinformatique
meacutedicale (UIM 2)
Centre de ressources biologiques (CRB)
Uniteacute transversale drsquoeacuteducation
theacuterapeutique du Patient (UTEP)
Uniteacute de seacutecuriteacute
transfusionnelle et heacutemovigilance
(USTH)
Laboratoire drsquohygiegravene
hospitaliegravere (LHH)
(Deacutepartement Microbiologie - Pocircle
Biologie)
Uniteacute de biostatistiques (UB)
Coordination des vigilances sanitaires
(CVS)
(+ activiteacute drsquoidentito-vigilance)
SIBM 2016 Department of Biomedical Informatics
MDs Engineers University Librarians Secretaries
Steacutefan DARMONI
(Prof) head Badisse DAHAMNA
Lina Soualmia (Senior
lecturer computer
science)
Benoit THIRION (Head
of the Medical Library)
Annie-Claude
LANCELEVEE
Jean-Philippe LEROY Ivan KERGOURLAY Chloeacute CABOT (PhD
student)
Catherine LETORD
(Pharmacist librarian) Sandrine VOURIOT
Nicolas GRIFFON
Julien GROSJEAN
(PostDoc)
Wiem CHEBIL (PhD
student) Gaeacutetan KERDELHUE
(librarian)
Matthieu SCHUERS
(GP PhD student)
Romain LELONG (PhD
student)
Melissa Mary (PhD
student Grant CIFRE
BioMeacuterieux)
Leacutea SEGAS (librarian)
2 residents in Public
Health
1 ou 2 residents in GP
Philippe MASSARI
(retired)
Financement CHU
Financement Universiteacute Rouen
Financement Conseil
Reacutegional Normandie Financement sur projets
de recherche
HeTOP content
2016 - CISMeF - Rouen University Hospital
- HeTOP is a repository dedicated to (European) health professionals and
students URL wwwhetopeu version NoSQL (beta-test) cisprochu-
rouenfrhetop
-HeTOP provides access to 69 health terminologies and ontology (TO)
available mainly in French or in English but also German Italian and
Dutch (European languages) but also with no Latin alphabet (Greek
Russian) and more recently outside Europe (Japanese Mandarin Arabic
amp Hebrew) (32 different languages)
- HeTOP can be used by humans and by computers via Web services
- The main objective of HeTOP is to provide an access to terminologies
and ontology allowing dynamic browsing and navigation
bull Free portal for over 20 TO eg MeSH CISMeF ICD10 amp CCAM
extended access restricted by IDpwd for academic use only
bull To be included next (Lausanne collaboration) extended ICD10 +
CHOP
HeTOP content
2014 - CISMeF - Rouen University Hospital
- HeTOP provides the usual data for each concept preferred terms
original code synonyms definitions and other attributes relations and
hierarchies
- Double (matricial) navigation
- among TO
- among languages
- Time consuming task gt 20 man-years (to develop) + 2 man-years per
year to maintain (integration amp maintenance of TO + mappings)
- Time consuming task to translate terminologies +++
- Several services on demand
- access to other resources on the Internet (PubMed CISMeF etc)
through a French InfoButton (InfoRoute)
- access to mappings tools (integrated in a beta version)
- acces to automatic indexing tool (ECMT)
Methods
2013 - CISMeF - Rouen University Hospital
To integrate terminologies and ontology into EHTOP
three steps are necessary
(1) designing a meta-model into which each terminology
and ontology can be integrated
(2) developing a process to include terminologies into
EHTOP
(3) building and integrating existing and new inter amp
intra-terminology semantic harmonization into EHTOP
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
Pocircle de Santeacute Publique JF Gehanno V Mangot MH Roux
Enseignement et recherche (sous-
section CNU 46-02)
Enseignement et recherche (sous-section CNU 46-04)
Enseignement et recherche (sous-section CNU 46-01)
JF Gehanno
SJ Darmoni
J Benichou
J Ladner
I Mareacutechal
V Merle
Deacutepartement de santeacute au travail et
pathologie professionnelle
(DSTPF)
Deacutepartement drsquoinformation et drsquoinformatique
meacutedicales (DIIM)
Deacutepartement drsquoappui agrave la
recherche clinique (DARC)
Deacutepartement de santeacute publique et
eacutepideacutemiologie (DSPE)
Deacutepartement
drsquoappui agrave la qualiteacute et agrave la seacutecuriteacute des
soins (DAQSS)
Deacutepartement
drsquohygiegravene hospitaliegravere (DHH)
Uniteacute de santeacute au travail des personnels hospitaliers
(USTPH)
Uniteacute drsquoinformation meacutedicale (UIM 1)
Centre drsquoinvestigation clinique (CIC)
Uniteacute de santeacute
publique et drsquoeacutepideacutemiologie clinique (USPEC)
Uniteacute de preacutevention des risques associeacutes aux soins (UPRAS)
Uniteacute de preacutevention des infections
nosocomiales (UPIN)
Uniteacute de pathologie professionnelle
(UPF)
Uniteacute drsquoinformatique
meacutedicale (UIM 2)
Centre de ressources biologiques (CRB)
Uniteacute transversale drsquoeacuteducation
theacuterapeutique du Patient (UTEP)
Uniteacute de seacutecuriteacute
transfusionnelle et heacutemovigilance
(USTH)
Laboratoire drsquohygiegravene
hospitaliegravere (LHH)
(Deacutepartement Microbiologie - Pocircle
Biologie)
Uniteacute de biostatistiques (UB)
Coordination des vigilances sanitaires
(CVS)
(+ activiteacute drsquoidentito-vigilance)
SIBM 2016 Department of Biomedical Informatics
MDs Engineers University Librarians Secretaries
Steacutefan DARMONI
(Prof) head Badisse DAHAMNA
Lina Soualmia (Senior
lecturer computer
science)
Benoit THIRION (Head
of the Medical Library)
Annie-Claude
LANCELEVEE
Jean-Philippe LEROY Ivan KERGOURLAY Chloeacute CABOT (PhD
student)
Catherine LETORD
(Pharmacist librarian) Sandrine VOURIOT
Nicolas GRIFFON
Julien GROSJEAN
(PostDoc)
Wiem CHEBIL (PhD
student) Gaeacutetan KERDELHUE
(librarian)
Matthieu SCHUERS
(GP PhD student)
Romain LELONG (PhD
student)
Melissa Mary (PhD
student Grant CIFRE
BioMeacuterieux)
Leacutea SEGAS (librarian)
2 residents in Public
Health
1 ou 2 residents in GP
Philippe MASSARI
(retired)
Financement CHU
Financement Universiteacute Rouen
Financement Conseil
Reacutegional Normandie Financement sur projets
de recherche
HeTOP content
2016 - CISMeF - Rouen University Hospital
- HeTOP is a repository dedicated to (European) health professionals and
students URL wwwhetopeu version NoSQL (beta-test) cisprochu-
rouenfrhetop
-HeTOP provides access to 69 health terminologies and ontology (TO)
available mainly in French or in English but also German Italian and
Dutch (European languages) but also with no Latin alphabet (Greek
Russian) and more recently outside Europe (Japanese Mandarin Arabic
amp Hebrew) (32 different languages)
- HeTOP can be used by humans and by computers via Web services
- The main objective of HeTOP is to provide an access to terminologies
and ontology allowing dynamic browsing and navigation
bull Free portal for over 20 TO eg MeSH CISMeF ICD10 amp CCAM
extended access restricted by IDpwd for academic use only
bull To be included next (Lausanne collaboration) extended ICD10 +
CHOP
HeTOP content
2014 - CISMeF - Rouen University Hospital
- HeTOP provides the usual data for each concept preferred terms
original code synonyms definitions and other attributes relations and
hierarchies
- Double (matricial) navigation
- among TO
- among languages
- Time consuming task gt 20 man-years (to develop) + 2 man-years per
year to maintain (integration amp maintenance of TO + mappings)
- Time consuming task to translate terminologies +++
- Several services on demand
- access to other resources on the Internet (PubMed CISMeF etc)
through a French InfoButton (InfoRoute)
- access to mappings tools (integrated in a beta version)
- acces to automatic indexing tool (ECMT)
Methods
2013 - CISMeF - Rouen University Hospital
To integrate terminologies and ontology into EHTOP
three steps are necessary
(1) designing a meta-model into which each terminology
and ontology can be integrated
(2) developing a process to include terminologies into
EHTOP
(3) building and integrating existing and new inter amp
intra-terminology semantic harmonization into EHTOP
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
SIBM 2016 Department of Biomedical Informatics
MDs Engineers University Librarians Secretaries
Steacutefan DARMONI
(Prof) head Badisse DAHAMNA
Lina Soualmia (Senior
lecturer computer
science)
Benoit THIRION (Head
of the Medical Library)
Annie-Claude
LANCELEVEE
Jean-Philippe LEROY Ivan KERGOURLAY Chloeacute CABOT (PhD
student)
Catherine LETORD
(Pharmacist librarian) Sandrine VOURIOT
Nicolas GRIFFON
Julien GROSJEAN
(PostDoc)
Wiem CHEBIL (PhD
student) Gaeacutetan KERDELHUE
(librarian)
Matthieu SCHUERS
(GP PhD student)
Romain LELONG (PhD
student)
Melissa Mary (PhD
student Grant CIFRE
BioMeacuterieux)
Leacutea SEGAS (librarian)
2 residents in Public
Health
1 ou 2 residents in GP
Philippe MASSARI
(retired)
Financement CHU
Financement Universiteacute Rouen
Financement Conseil
Reacutegional Normandie Financement sur projets
de recherche
HeTOP content
2016 - CISMeF - Rouen University Hospital
- HeTOP is a repository dedicated to (European) health professionals and
students URL wwwhetopeu version NoSQL (beta-test) cisprochu-
rouenfrhetop
-HeTOP provides access to 69 health terminologies and ontology (TO)
available mainly in French or in English but also German Italian and
Dutch (European languages) but also with no Latin alphabet (Greek
Russian) and more recently outside Europe (Japanese Mandarin Arabic
amp Hebrew) (32 different languages)
- HeTOP can be used by humans and by computers via Web services
- The main objective of HeTOP is to provide an access to terminologies
and ontology allowing dynamic browsing and navigation
bull Free portal for over 20 TO eg MeSH CISMeF ICD10 amp CCAM
extended access restricted by IDpwd for academic use only
bull To be included next (Lausanne collaboration) extended ICD10 +
CHOP
HeTOP content
2014 - CISMeF - Rouen University Hospital
- HeTOP provides the usual data for each concept preferred terms
original code synonyms definitions and other attributes relations and
hierarchies
- Double (matricial) navigation
- among TO
- among languages
- Time consuming task gt 20 man-years (to develop) + 2 man-years per
year to maintain (integration amp maintenance of TO + mappings)
- Time consuming task to translate terminologies +++
- Several services on demand
- access to other resources on the Internet (PubMed CISMeF etc)
through a French InfoButton (InfoRoute)
- access to mappings tools (integrated in a beta version)
- acces to automatic indexing tool (ECMT)
Methods
2013 - CISMeF - Rouen University Hospital
To integrate terminologies and ontology into EHTOP
three steps are necessary
(1) designing a meta-model into which each terminology
and ontology can be integrated
(2) developing a process to include terminologies into
EHTOP
(3) building and integrating existing and new inter amp
intra-terminology semantic harmonization into EHTOP
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
HeTOP content
2016 - CISMeF - Rouen University Hospital
- HeTOP is a repository dedicated to (European) health professionals and
students URL wwwhetopeu version NoSQL (beta-test) cisprochu-
rouenfrhetop
-HeTOP provides access to 69 health terminologies and ontology (TO)
available mainly in French or in English but also German Italian and
Dutch (European languages) but also with no Latin alphabet (Greek
Russian) and more recently outside Europe (Japanese Mandarin Arabic
amp Hebrew) (32 different languages)
- HeTOP can be used by humans and by computers via Web services
- The main objective of HeTOP is to provide an access to terminologies
and ontology allowing dynamic browsing and navigation
bull Free portal for over 20 TO eg MeSH CISMeF ICD10 amp CCAM
extended access restricted by IDpwd for academic use only
bull To be included next (Lausanne collaboration) extended ICD10 +
CHOP
HeTOP content
2014 - CISMeF - Rouen University Hospital
- HeTOP provides the usual data for each concept preferred terms
original code synonyms definitions and other attributes relations and
hierarchies
- Double (matricial) navigation
- among TO
- among languages
- Time consuming task gt 20 man-years (to develop) + 2 man-years per
year to maintain (integration amp maintenance of TO + mappings)
- Time consuming task to translate terminologies +++
- Several services on demand
- access to other resources on the Internet (PubMed CISMeF etc)
through a French InfoButton (InfoRoute)
- access to mappings tools (integrated in a beta version)
- acces to automatic indexing tool (ECMT)
Methods
2013 - CISMeF - Rouen University Hospital
To integrate terminologies and ontology into EHTOP
three steps are necessary
(1) designing a meta-model into which each terminology
and ontology can be integrated
(2) developing a process to include terminologies into
EHTOP
(3) building and integrating existing and new inter amp
intra-terminology semantic harmonization into EHTOP
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
HeTOP content
2014 - CISMeF - Rouen University Hospital
- HeTOP provides the usual data for each concept preferred terms
original code synonyms definitions and other attributes relations and
hierarchies
- Double (matricial) navigation
- among TO
- among languages
- Time consuming task gt 20 man-years (to develop) + 2 man-years per
year to maintain (integration amp maintenance of TO + mappings)
- Time consuming task to translate terminologies +++
- Several services on demand
- access to other resources on the Internet (PubMed CISMeF etc)
through a French InfoButton (InfoRoute)
- access to mappings tools (integrated in a beta version)
- acces to automatic indexing tool (ECMT)
Methods
2013 - CISMeF - Rouen University Hospital
To integrate terminologies and ontology into EHTOP
three steps are necessary
(1) designing a meta-model into which each terminology
and ontology can be integrated
(2) developing a process to include terminologies into
EHTOP
(3) building and integrating existing and new inter amp
intra-terminology semantic harmonization into EHTOP
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Methods
2013 - CISMeF - Rouen University Hospital
To integrate terminologies and ontology into EHTOP
three steps are necessary
(1) designing a meta-model into which each terminology
and ontology can be integrated
(2) developing a process to include terminologies into
EHTOP
(3) building and integrating existing and new inter amp
intra-terminology semantic harmonization into EHTOP
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
HeTOP generic model
7
Compliant with
ISO Terminology model
More simple
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Validation
Ontology tools
Integration OWL instances
Raw data
DB
Parsers
Model
OWL instance
Format
2013 - CISMeF - Rouen University Hospital
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Methods amp technologies (1)
HeTOP
db
Application server
(Apache Tomcat)
Clients
EHTOP
service
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Methods amp technologies (2)
PTS
db
bull Oracle 111g (optimizations amp partitionning) =gt
NoSQL in 2015
HeTOP
service
bull Java J2EE
bull CISMeF APIs
bull Apache Tomcat
bull Infinispan cache layer
bull Cross-browser (Vaadin framework)
=gt new framework in 2016 (INSA Rouen Engineering
School)
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Croslingual Health Multi-
TerminologyOntology Portal
First version before HeTOP (French amp English)
URL httpptschu-rouenfr
Access for humans and coumputers (Web services)
Since September 2010 daily used by CISMeF team to index manually and
automatically Web resources
Since January 2011 MeSH is freely available (500 unique users per
working day)
Teaching tool Rouen Medical School (since Sept 2010) to teach anatomy
and rare diseases
Terminology auditing HPOOrphanet
TO translations into French FMA HPO SNOMED CT MEDLINEplus
Restricted access to the other terminologies (2250 registred)
Cooperation with BioPortal Clement Jonquet amp Mark Musen (ANR
Jeunes Chercheurs project SIFR)
11
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
2011 - CISMeF - Rouen University Hospital
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
2011 - CISMeF - Rouen University Hospital
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Three main terminology servers
in health
UMLS NIH Bethesda (USA)
Plus de 150 TO
Essentiellement en anglais
La reacutefeacuterence internationale pour la diffusion mais pas pour la consultation
BioPortal NCBO Stanford (USA)
Plus de 400 T0 (beaucoup en biologie avec peu de concepts)
Essentiellement en anglais (silo pour les autres langues)
La reacutefeacuterence pour poster une ontologie
HeTOP SIBM Rouen (France)
70 T0 en 32 langues
La reacutefeacuterence inter-lingue (navigation entre les langues) et dans le monde francophone
Grosjean J et coll An Approach to Compare Bio-Ontologies Portals Stud Health Technol
Inform 20142051008-1012 17
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
HeTOP main figures
Terminologies
amp ontologies
Concepts Synonymes Deacutefinitions Relations amp
hieacuterarchies
25 gt 580 000 gt 840 000 gt 220 000 gt 1 200 000
May 2010
Terminologies Concepts Synonymes Deacutefinitions Relations
32 gt 980 000 gt 2 300 000 gt 220 000 gt 4 000 000
May 2011
April 2013
Terminologies Concepts Synonymes Deacutefinitions Relations
45 asymp 1 620 000 asymp 3 700 000 asymp 220 000 asymp 5 500 000
February 2016
Terminologies Concepts
in English
Concepts
in French
Terms Attributes Relations
69 (17 UMLS) 2358544 1171340 9134636 15348985 9697000
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Main figures
Registered users gt 2 600
traffic 15 000 hitsday (6 to 700 users per working day)
2014 - CISMeF - Rouen University Hospital
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Terminologies in French AND
in UMLS (2015AA)
Source vocabulary
(Translator) [N lang]
Number of Strings
(PT + syn + acro)
Number of CUIs
( Translation in French)
MeSH Fr [14] 105758 41229
MedDRA Fr [9] 73860 73608
WHO-ART Fr [5] 3631 3091
MTHMST Fr 1833 1636
ICPC2 [19] 702 722 (100)
MeSH Fr HeTOP
Descriptors (INSERM) [16] 105274 27747 (100)
Supplementary Concepts
(CISMeF) [2]
58514 42693 (1824)
Concepts (INSERM amp
CISMeF) [2]
99184 94052 (2620)
2015 - CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
( of translation Fr)
ICD10 Fr (WHO) [12] 26337 12143 (100)
ICD10 PCS (CISMeF) [2] 7297 7297 (5)
ICD9 Fr (WHO) [1] 10716 7356 (100)
ICDO Fr (WHO+CISMeF) [3] 1462 1362 (100)
ICF (WHO) [2] 1496 1495
FMA Fr (U of Washington) 4564 4452
FMA Fr in HeTOP (CISMeF) [7] 16631 16369 (2020)
ICNP [2] 2811 1158 (92)
SNOMED Int [2] 139792 96756 (9451)
ATC (WHO) [3] 5834 5757 (100)
SNOMED CT 126492 (4185) 20145- CISMeF - Rouen University Hospital
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Terminologies in UMLS NOT
in French
Source vocabulary
(Translator) [N lang]
Number of Strings
FrEn of tranlsations
Number of CUIs via
CISMeF mappings
WHO-ICPS [2] 424 105
RADLEX (CISMeF) [2] 9465423132230 240
LOINC (APHP amp SFIL)
[2]
58950 58500 (6071)
MEDLINEplus Fr
(CISMeF amp LIMSI) [2]
849 846 (100)
NCIT Fr (CISMeF) [2] 57827 56727 (6040)
2015 - CISMeF - Rouen University Hospital
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Terminologies in French that
are not included in UMLS
Source vocabulary
(Translator) [N lang]
Number of Strings Number of CUIs
OMIM Fr (CISMeF) [2] 7770 6904 (8884)
HRDO (Orphanet) [2] 13535id100 4943
HPO (CISMeF) [2] 11127119089344 1541
CCAM (procedure) [1] 10121 0
BNPC (toxicology) [2] 91751 11539
Q-Codes 184
hellip including interface
terminologies
in biology and imaging
Overall number of distinct CUI with at least one French translation in HeTOP
asymp 333239 vs asymp 88000 in UMLS (x379)
108 millions of RDF triplets (big data in health) in 2014
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
HeTOP relationships
(examples amp numbers)
21082015
Source Term
(Terminology)
Target Term
(Terminology)
Number of
relations in
HeTOP
UMLSalignment Myocardial Infarction
(MeSH)
Myocardial infarction
NOS
(SNOMED Int)
537978
CISMeFmanual Riedel thyroiditis
(HRDO)
Riedelrsquos thyroiditis
(MedDRA)
56265
CISMeFexact appetite
stimulants
(ATC)
Appetite stimulated
(WHOART)
653709
CISMeFSupervised Gonadotropin
releasing hormone
(MeSH)
Luteotropin-releasing
factor
(FMA)
286561
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
bull Formal representation of complex clinical data structures =
none
bull Formal representation of physiological models = none
bull Temporal relations = none
bull Data quality = based on T0 quality and point of view
bull Formalism amp reasoning capabilities = none
bull Collaborative editingsearchingsharing tools = collaboration
with BioPortal to share tools (Clement Jonquet)
bull TO versioning = not yet provided by HeTOP
bull Semantic resources distributiondissemination processes =
56 T0 avalable in OWL format (latest version)SKOSRDF in
several languages
HeTOP limits
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Other tools integrated HeTOP
ECMT Extracteur de Concepts Multi Terminologiques
Able to extract health concepts from any text eg discharge summary in frac12 second
(NoSQL)
Valorization with Alicante SME
Used in daily practice in the Catholic University Hospital of Lille France Dr Arnaud
Hansske around one million discharge summaries indexed with ECMT
InfoRoute a French InfoButton
URL inforoutechu-rouenfr
Access to a contextualized knowledge based on semantic expansion based on
manual amp supevised mapping among terminologies
MTHeTOP tool to perform automatic mappings amp translations
Generic semantic search engine
DocrsquoCISMeF (URL doccismefchu-rouenfr) on grey literature about health in French
on the Internet (105 resources)
LISSA (URL wwwlissafr) a PubMed in French (07 x 106 citations drsquoarticles)
RIDOPI search engine in EHR (8 x 106 discharge summaries in Rouen around 109
health concepts in these summaries 108 numerical data in Rouen)
26
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Master M2IBM ndash Lina Soualmia amp Catherine Duclos ndash LIMampBio UFR SMBH Paris XIII
RI-BI - Lina Soualmia - Universiteacute de
Rouen
27
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Metrics of ECMT (April 2015
CLEF eHealth in French)
28
TP FP FN P R F1
Anatomy 142 149 54 04880 07245 05832
Chemistry 153 38 108 08010 05862 06670
Devices 13 12 6 05200 06842 05909
Disorders 375 96 209 07962 06421 07109
Geography 14 4 7 07778 06667 07179
Live Beings 125 38 31 07669 08013 07837
Objects 3 16 28 01579 00968 01200
Phenotype 14 35 17 02857 04516 03500
Physiology 60 33 74 06452 04478 05286
Procedure 195 105 109 06500 06414 06457
Overall 1094 526 643 06753 06298 06518
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Results obtained in 2015 and 2016 from the training
set (832 documents) provided by CLEF e-Health
2015indexed with ECMT
2015 participation in the CLEF e-Health challenge
(httpssitesgooglecomsiteclefehealth2015)
bull Addresses clinical named entity recognition in languages other than
English
2016 Deduplication of identical terms from multiples terminologies
Reevaluation in the 2016 CLEF e-Health challenge
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Detailed 2016 results obtained from the training set
(832 documents) provided by CLEF e-Health
2015indexed wit
Disparities between UMLS semantic groups
ANAT (anatomy) DISO (disorders) ICD10 MeSH FMAhellip
OBJC (objects food) PHEN (natural environmental
phenomenon or process) few terminological resources
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
2016 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 151 340 64 03075 07023 04278 216 275 46 04399 08244 05737
CHEM 172 172 106 05000 06187 05531 257 87 63 07471 08031 07741
DEVI 17 22 9 04359 06538 05231 23 16 5 05897 08214 06866
DISO 465 485 267 04895 06352 05529 789 161 134 08305 08548 08425
GEOG 26 7 12 07879 06842 07324 28 5 11 08485 07179 07778
LIVB 170 124 94 05782 06439 06093 223 71 67 07585 07690 07637
OBJC 8 19 36 02963 01818 02254 8 19 36 02963 01818 02254
PHEN 13 47 41 02167 02407 02281 15 45 40 02500 02727 02609
PHYS 81 79 74 05063 05226 05143 102 58 67 06375 06036 06201
PROC 276 296 159 04825 06345 05482 384 188 96 06713 08000 07300
Overall 1 379 1 591 862 04643 06154 05293 2 045 925 565 06886 07835 07330
2015 Entity Recognition exact match Entity Recognition inexact match
TP FP FN Precision Recall F1 TP FP FN Precision Recall F1
ANAT 90 401 498 01833 01531 01668 225 266 155 04582 05921 05166
CHEM 98 246 1 064 02849 00843 01301 250 94 211 07267 05423 06211
DEVI 7 32 56 01795 01111 01373 22 17 29 05641 04314 04889
DISO 276 674 1 782 02905 01341 01835 784 166 537 08253 05935 06904
GEOG 23 10 31 06970 04259 05287 28 5 25 08485 05283 06512
LIVB 94 200 381 03197 01979 02445 196 98 153 06667 05616 06096
OBJC 4 23 117 01481 00331 00541 12 15 105 04444 01026 01667
PHEN 11 49 130 01833 00780 01095 19 41 117 03167 01397 01939
PHYS 41 119 441 02563 00851 01277 93 67 305 05813 02337 03333
PROC 146 426 860 02552 01451 01850 405 167 307 07080 05688 06308
Overall 790 2 180 5 360 02660 01285 01732 2 034 936 1 944 06848 05113 05855
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Semantic harmonization
mapping alignment
Three methods employed
URL httpcisprochu-rouenfrMT_EHTOP
Conceptual Same CUI
Other relations close match BT-NT NT-BT (SKOS)
On UMLS (n=17 included in HeTOP)
NLP More or less same algorithm of automatic indexing
Bag of words
on n(n-1)2 TO (included in the HeTOP)
Statitistical Co-occurrnece matrix
CCAM-ICD10 CCAM-LPP
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
33
Conceptual Same CUI
NLP Bag of words
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Systegraveme drsquoinformation du SIBM
Donneacutees du dossier
patient
N = 109 concepts
meacutedicaux
WEB
Fournisseurs de
reacutefeacuterences
bibliographiques
Fournisseurs de
terminologies
Etc
Donneacutees
cliniques
Donneacutees
omiques
Reacutefeacuterences drsquoarticles
scientifiques
francophones
N = 900 000
Ressources gratuites
francophones
N = 110 000
Terminologies de
santeacute
N = 68
23 millions concepts
Alignement
et traduction ECMT
MTHeTOP
InfoRoute
indexation
2007
1995
2012
2010
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
Many thanks
2011 - CISMeF - Rouen University Hospital
Questions
Email StefanDarmonichu-rouenfr
01062016
01062016