XML_DB

47
1 XML et bases de données Formation SANDRE Pierre LAGARDE

description

xxxx

Transcript of XML_DB

  • XML et bases de donnesPierre LAGARDEFormation SANDRE

  • Donnes versus Documents

  • Documents centrs donnes / XML structurUtilis pour les changes de donnesConstruit pour lanalyse par lordinateurCaractristiques :Structures rguliresDonnes faible granularitPas de contenu mixteLordre des balises na pas dimportanceExemplesDonnes scientifiques

  • Exemple de fichier XML data-centricXML-SANDRE

  • Document centr document/ unstructured XMLElabor pour la lecture humaineCaractristiquesStructures irrguliresDonnes granularit importante (un lment = le document !)Donnes mixtesLordre des lment est importantSouvent ne provient pas dune base de donnesExemplesUn livreUn document issu de Word..

  • Principales solutions pour grer du XML XML documentsFiles systemBD Relationnel ou objectXML DB BD XML- enabled BD XML nativeApplication mapping Middleware XML

  • Stocker et retrouver des donnes

  • Solutions pour les donnesMapper la structure XMLUtilisation dun langage XML Utilisation de XSLT pour transformer la structure XML structure dans une structure attendue par la base de donnesXML documents XML- enabled database

    MappingXML structures (XML schema)Database schema database

    Middleware

  • Mapper les fichiers dans le schma de la baseLe mapping concerneLes lments, attributs et textesIgnore la structure physique (CDATA sections, encoding,) Ignore les structures logiques (processing instructions, comments, order,)

    2 approches du mappingTable-Based mappingObject-relational mappingUn import et un export ne cre pas le mme fichier (mais les mmes donnes)

  • Table-based mappingTransfrer les donnes dans les tables relationnelles

    1340 Nitrates Valid 1987-06-01 2004-12-20T00:00:00 DIREN de bassin ILE-de-FRANCE et CENTRE NO3- Nitrates 24 Essais des eaux Dosage colorimtrique des ions nitrateNF T 90-012 - Fvrier 1952) 25 Essais des eaux - Dosage des nitrates (NF T 90-012 - Aot 1975)

    ParameterMethods

  • Table-based mapping avec OracleEn deux oprations

    Java OracleXML getXML conn jdbc:oracle:thin@localhost:1521:orcl user system/admin select * from parametres

    ... ... ... ... ... ... ... OracleXML (XSU)La structure XMLXSLT

  • Demonstration with Oracle Need Oracle 8i, 9i or 10g

    Creates (or exists) a specific schema (relational)

    To show XML documents : Java OracleXML getXML user system/admin select * from parametres

    To put XML Documents : Java OracleXML putXML user system/admin -fileName nvpara.xml parametres

  • Object-Relational mappingModlisation des donnes comme des arbres dobjets

    1340 Nitrates Valid 1987-06-01 2004-12-20T00:00:00 DIREN de bassin ILE-de-FRANCE et CENTRE NO3- Nitrates 24 Essais des eaux Dosage colorimtrique des ions nitrateNF T 90-012 - Fvrier 1952) 25 Essais des eaux - Dosage des nitrates (NF T 90-012 - Aot 1975)

    XML documentObjects viewDatabaseXML data bindingObject/ relational

  • Object-Relational mapping dans Oracle2 solutionsUse Object viewXMLType

    Demonstration of object view

    Demonstration of XMLTYPE a native XML in Oracle ?

    Very complex to create

  • Les langages de requtesPermettre dextraire des donnes XML dune base de donnes

    Trois approches :Un dveloppement spcifiqueUn langage SQL amliorUn langage XML Query ParameterMethods

    1340 Nitrates Valid 1987-06-01 2004-12-20T00:00:00 DIREN de bassin ILE-de-FRANCE et CENTRE NO3- Nitrates 24 Essais des eaux Dosage colorimtrique des ions nitrateNF T 90-012 - Fvrier 1952) 25 Essais des eaux - Dosage des nitrates (NF T 90-012 - Aot 1975)

  • Un langage SQL amliorUtilis le SQL avec de nouvelles commandesSQL / XML : une volution de ANSI/ISO SQL 2003Exemple : select XMLElement("Code",code_parametre) from parametres

    Select XMLElement("Parametre",XMLElement("CodeParametre",XMLAttributes('SANDRE' as "schemeAgency"),code_parametre),XMLElement("NomParametre",nom_parametre)) from parametres

  • Langage XML QueryUn langage de programmation pour interroger des collections de donnes XMLXQueryDvelopp par W3C (Working Draft in 2005)Utilisation XPATHSources multiplesSpcifique XML

  • Principes XQUERYFLWR Expressions

    For . Let .Where Return .Exemples let $a := input()/SI_DC return Actually, there are { count($a) } networks.for $a in input()/SI_DC/Bdd where $a/TypeBdd/CdTypeBanque = '3 return $afor $a in input()/SI_DC/Bdd, $b in input()//BddRdd where $b/Bdd/CdBdd=$a/CdBdd return {$b/Bdd/CdBdd}{$a/LbBdd}

  • Principes XQUERYTri

    Fonctions dupdateupdate replace input()/SI_DC/DispositifCollecte/MnRdd[.="RBESOUQAP"] with testfor $a in input()//DispositifCollecte return $a/MnRdd

    Recherche de texte

  • Stocker et retrouver des documents

  • Principals solutions to store XML documentsXML documentsFiles systemRelational database object-oriented databaseXML DB XML- enabled database Native XML databaseApplication mapping XML Middleware

  • Stockage Files systemTrs simpleOutils comme grep pour accder XMLPas trs performant

  • Stocker les fichiers dans un BLOBMapper les documents XML dans un champ binaire : BLOBUtilisation des technologies full-textXML documentsXML documents (no structured)Database schema BINARY

  • Utilisation des BLOBsAvantagestransactionnelScuritmultiutilisateurAdministrationQuelques produits sont are XML-awareElimine le problme des tagsCompatible avec le langage XQUERYOracle 9i+ Db2

  • Utilisation dune base native XMLUne base de donnes optimise pour le stockage XMLUtilise uniquement des XML technologies : XPath, XQuery,XML documentsXML documents (no structured)XML native database

  • Dfinition dune base de donnes XML nativeUn terme marketing de Tamino et non une dfinition technique.

    Dfinition Un modle logique pour des documents XML XML document = lunit de base du stockage (= une ligne en BD relationnel)Pas de modle particulier pour le stockage physique

  • Native XML Database ArchitectureText-Based Native XML DatabaseStore XML as textFile system, BLOB, Indexes of all textSpeed advantage when retrieving entire documentsSlow to find pieces of many documents

    Model-Based Native XML DatabaseStore XML documents in an internal object modelPerformance similar to text-based Less slower to find pieces of many documents

    Similar to a hierarchical database

  • Features of Native XML DatabasesDocument CollectionsNotion of collection (=table in an relational database)Hierarchy of collections like directories in an file systemQuery LanguagesSupport one or more query languagesXPath, XQuery,Transactions, Locking and ConcurrencySupport transactionsLocking at the level of entire documentsIn the future, locking at the level of elements,

  • Features of Native XML DatabasesApplication Programming Interface : APIOffer programmatic APIsProprietary APIs XQuery for JavaMost XML Databases offer the ability to execute HTTP queriesRound-trippingStore an XML document and get the same document back againRemote data / repositorySome database include remote data and repository

  • Features of native XML DatabasesIndexesA way to increase query speedStructural indexesFull-text indexes

    External entity storageThe problem to store external entities

  • An example : Tamino (Software AG)A server oriented native XML DatabaseFeatures Proprietary Model-based storage Storage and search data with Query interpreterBased from XML StandardsConnexion with XML softwares

  • An example : Tamino (Software AG)XML EngineNative XML storageQuery processorsXQueryX-Query (!)XML DatamapMetadata about XML documentsDescribes where the data in a given XML document is stored X-Tensionaccess various external applications Management / Security

  • Demonstration Tamino

  • An example : eXist http://exist.sourceforge.net/ An OpenSource native XML databaseFeaturesProprietary Text-based storage (B+-trees and paged files)Hierarchical collectionsAutomatic IndexesSupports XQueryFull-text searchWork with HTTP / JavaSupport XUpdateNo transaction support

  • Demonstration eXist

  • eXist : example of SANDREStore schemas XML published by the SANDRE

    Find markup and characteristics

    Download schemas

    Parse XML documents

  • An example SANDRESchema = an XML document

    DB Schema = XML database eXist

    Search and publish tags = XQuery

  • The market for XML database products Categories of XML productsMiddleware : Software you call from your application to transfer data between XML documents and databases XML-enabled database Native XML databaseXML Servers : XML-aware J2EE servers, Web application servers, integration engines, and custom servers. Wrappers : Software that treats XML documents as a source of relational data. These products typically query XML documents using SQL. XML Query engines : Standalone engines that can query XML documents XML Data bindings : Products that can bind XML documents to objects. Some of these can also store/retrieve objects from the database.

  • Products : MiddlewareLot of products !

  • Products : XML-Enabled databaseMain databases are XML-Enabled if you buy recent versions with important differences in the implementation of XML features

  • Products : Native XML databaseLot of opensource implementationsCommercial ?

  • Products : others XML ServersCocoon (Apache)ColdFusion (Macromedia)Zope (OpenSource)

    XQuery engineBQ-XQuery engine Web Logic Integration XQuery engine DataDirect XQuery XQEngine XQuery for .NET XQuery Processor Data bindingsCastorJAXB for JavaJakarta (Apache)XDK OracleXML definition tool (.net)WrappersSQLServerDB2 Information Integrator

  • Advantages / Disadvantages of relational databaseVery used !

    Rules to organise data (modelling data) and manages the RDMS

    Efficient with data

    Allow all types of search

    SQL standard

    Complex (and slow) to find hierarchical data (with join,)

    Bad performance with document

    Need engine to generate and understand XML documents

  • Advantages / Disadvantages of XML-Enabled databaseA database to store relational AND XML documents

    Rules to organise data (modelling data) and manages the DDMS

    Efficient with data and to mapping XML Data

    SQL extension less new skills to be acquired

    Complex to manage

    Data type conversion and Null data

    Binary data ?

    Character Sets

  • Advantages / Disadvantages of native XML databaseAll is made to manage XML documents

    Efficient to store and use XML documents

    Performance to find an document or an fraction of document

    XQuery / XPath

    No mandatory normalizationProblem of referential IntegrityLess good performances when you search pieces of documentsScabilityTools to develop applicationsOthers types of data ?New skills

  • Comparison relational, XML-enabled, Native

  • Links about XML DatabasesXML family : www.w3c.orgIntroduction to native XML database : http://www.xml.com/pub/a/2001/10/31/nativexmldb.html History of XML database : http://www.eaijournal.com/PDF/XMLMcGoveran.pdf XQuery : http://www.xml.com/pub/a/2002/10/16/xquery.htmlOracle : http://www.oracle.com/technology/tech/xml/index.htmlTamino : http://www1.softwareag.com/Corporate/products/tamino/default.asp XIndice : http://xml.apache.org/xindice/ eXist : http://exist.sourceforge.net/ SANDRE : www.sandre.eaufrance.fr XML-EAU : http://www.sandre.eaufrance.fr/francais/frame/sagen.htm?page=../../xmleau/index.html