1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO...

27
1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA- LEO Workshop Sources Ouvertes et Service – Caen 2010

Transcript of 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO...

Page 1: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

1

Le Store WebContent

Benjamin NGUYEN UVSQ & INRIA-SMISSpyros ZOUPANOS U.Paris Dauphine & INRIA-LEO

Workshop Sources Ouvertes et Service – Caen 2010

Page 2: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

2

Plan

1. Enjeux de l’utilisation d’un SGBD XML

2. Exemple du store centralisé

3. Store P2P

Page 3: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

3

Enjeux de l’utilisation d’un SGBD XML

Page 4: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

4

Architecture de WebContent

XML !!

Page 5: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

5

Le format pivot (UML)

Page 6: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

6

Le format pivot (exemple XML)

•Les schémas XML

•Un exemple de document

Page 7: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

7

Paradigme des Services Web

•Un modèle pivot de documents basé sur UML et implémenté par un XML Schema

•Des données échangées via des services web en XML

•Un traitement en interne des informations par des programmes en C, Java, etc…

Page 8: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

8

Quelle utilisation des documents ?

•Stockage• Définition d’une URI• Sert de référence future

•Récupération• Intégralité du document indexation des documents• Sous partie du document indexation structurelle

•Requêtes utilisation d’un langage de requête (XQuery)• Modification de la structure du document• Jointure entre documents

•Modification (possiblement concurrente) des documents lors d’une chaîne de traitement problème d’incohérences / redondance

•Il faut utiliser un véritable SGBD et pas simplement des fichiers !

Page 9: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

9

Store centralisé

Page 10: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

10

Construction du StoreService

•Problème : • Construire un wrapper de Web Service sur une base de données XML

déjà existante (eXist, MonetDB, SQL Server, Quizx, … Sedna ?)

• Choix d’un SGBD existant performant & fonctionnel (et des interfaces existant par rapport au langage d’interrogation)

•Caractéristiques• Passage à l’échelle

– Nombre de documents (ok?)

– Nombre de requêtes (pb?)

• Robustesse

• Reprise sur panne

Page 11: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

11

Fonctionnalités

•Stockage d’un document du modèle• Génération de l’URI

•Récupération du document• Dans son intégralité

• Juste une sous partie

• Basé sur l’utilisation des URIs WebContent

•Requêtes XQuery complexes sur les documents

Page 12: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

12

Démo du store centralisé

Page 13: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

13

Enseignements

•L’utilisation simple de fichiers n’aurait pas fonctionné

•Le système passe à l’échelle en terme de documents (plusieurs milliers)

•Un système d’indexation simple aurait pu suffire

•Peu de fonctionnalités véritables du SGBD ont été utilisées

•Il est difficile de séparer le service de stockage du service d’interrogation

Page 14: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

14

3- Store P2P

Page 15: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

15

Outline of P2P services

SOAP access

Local P2P SPARQLprocessor

Local P2P XML- AXMLquery processor

Local P2P XML- AXMLquery optimizer

XML store +Xquery processing

Local part ofthe P2P index

File storage

Peer 2

Peer 4

Peer 1

Peer 3

Page 16: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

16

Peer-to-peer storage service

Implemented jointly by several peers

DHT service is implemented on top of DHTs

Basic functionality of a DHT• put(k,v)

• get(k)

Different DHTs may have different algorithmic properties• Goal: combine the advantages of the available DHTs

Page 17: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

17

KadoP

Index keys:• all element & attribute names

Index values:• structural identifiers (docID, start, end)

Kadop’s query language:• tree pattern language

• each node is labeled with a XML node or word

• edges are / or //

Query evaluation:• holistic join on the list of identifiers associated to the query node labels

Advantage:• fast tree patern query execution based on the one and only holistic join

Page 18: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

18

PathFinder

Index keys:• linear parent-child rooted paths

Index values:• sorted docId list of documents that contain a given path

PathFinder’s query language:• conjunctions of disjunctions of atomic path expressions

• atomic path: XPath with child axis + predicate

Special hash function:• keys that are lexicographically close are stored to peers that are close between themselves

Advantage:• inequality queries e.g. /article[ year > 2005 ]

Page 19: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

19

<myPage><axml:sc service="getProgram" peer="tvchannel.com">

<parameter>Movies</parameter></axml:sc>

</myPage>

AXML language

Data-centric Web service composition

ActiveXML document = XML document including calls to (continuous) Web services

• A service call contains contact info for the Web service

• When the calls is activated, results are added to the document as siblings of the service call.

<myPage><axml:sc service="getProgram" peer="tvchannel.com">

<parameter>Movies</parameter></axml:sc><program day="today"><movie>Shrek 3</movie></program>

</myPage>

<myPage><axml:sc service="getProgram" peer="tvchannel.com">

<parameter>Movies</parameter></axml:sc><program day="today"><movie>Shrek 3</movie></program><program day="tomorrow"><movie>Persepolis</movie></program>

</myPage>

Page 20: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

20

AXML evaluation

AXML document d@p1, sc in d calls s@p2($in)

Activating sc entails:• stream $in to p2

• evaluate s@p2

• stream the results of s@p2 to p1

p1 p2

s@p2

d

Page 21: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

21

OptimAX

A rule based AXML optimizer• uses an extensible set of rules (cooperates with other programs to perform better document opimization)

• rewrites an AXML document to another one with smaller overall evaluation cost

• uses statistics for document optimization

Page 22: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

22

Putting everything together

When a query arrives:• it is introduced into a document d1

• d1 is given to optimAX for optimization

• optimAX will cooperate with TGV – an algebraic XQuery compiler – to decompose it into sub-queries

• based on the type of the sub-query, optimAX will create a call to the right DHT

• a compensation query will be added in the new document d2 above all the calls to the DHTs

• further optimization if needed will be performed

• d2 will be given to the AXML engine for evaluation

Page 23: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

23

Outline of P2P services

SOAP access

Local P2P SPARQLprocessor

Local P2P XML- AXMLquery processor

Local P2P XML- AXMLquery optimizer

XML store +Xquery processing

Local part ofthe P2P index

File storage

Peer 2

Peer 4

Peer 1

Peer 3

Page 24: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

24

Example

Original query

for $d1 in /Col1/Book, $d2 in /Col2/Book where $d1/PageNo < 400 and $d2/Author = "Ullman" and $d2/Code = $d1/Codereturn $d2/Title

OptimAX TGV

Page 25: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

25

Example

Join query

for $x_0 in $in_0//res, $x_1 in $in_1//res where $x_0//Code= $x_1/Code return $x_0//Title

Join query

for $x_0 in $in_0//res, $x_1 in $in_1//res where $x_0//Code= $x_1/Code return $x_0//Title

Tree pattern query

<Col2> <Book returned="true"> <Author>Ullman</Author> <Code/></Book></Col2>

Tree pattern query

<Col2> <Book returned="true"> <Author>Ullman</Author> <Code/></Book></Col2>

Query with inequality

for $d1 in /Col1/Book where $d1/PageNo < 400 return <res>{$d1/Code}</res>

Query with inequality

for $d1 in /Col1/Book where $d1/PageNo < 400 return <res>{$d1/Code}</res>

Page 26: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

26

SPARQL processor

In WebContent there are semantic data to be queried.

A semantic peer marks available resources according to the appropriate ontologies.

Semantic peers interact with each other by means of mapping.

With the use of ontologies and mappings, reasoning can be inferred.

Page 27: 1 Le Store WebContent Benjamin NGUYEN UVSQ & INRIA-SMIS Spyros ZOUPANOS U.Paris Dauphine & INRIA-LEO Workshop Sources Ouvertes et Service – Caen 2010.

27

Semantic example

Query:

Q(X) :- P1:Aircraft(X)

Max conjunctive rewritings:

R1(X) :- P1:Aircraft(X)

R2(X) :- P2:Airplane(X)

R3(X) :- P2:Airliner(X)

R4(X) :- P2:Cargo(X)

Aircraft

Airplane

Airliner Cargo

P1

P2