1 Multidatabase manipulations Part 2 Witold Litwin .

82
Multidatabase manipulations Part 2 Witold Litwin http://ceria.dauphine. fr / witold .html
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    0

Transcript of 1 Multidatabase manipulations Part 2 Witold Litwin .

1

Multidatabase manipulations Part 2

Multidatabase manipulations Part 2

Witold Litwin

http://ceria.dauphine.fr/witold.html

2

Multidatabase manipulations(Kandinsky: Ligne avec Accompagnement, 1937 )

Multidatabase manipulations(Kandinsky: Ligne avec Accompagnement, 1937 )

3

Multidatabase manipulations

4

MSQL(Litwin, Abdellatif, Nicolas, Zeroual, 1989

L. Suardi, M. Rusinkiewicz, 1992)

An extension to SQL– Contains by definition every SQL-x

Allows for non-procedural multidatabase base manipulations MSQL queries impossible to formulate in SQL An MSQL query may replace several SQL queries

Developed in 1986-89» INRIA, projet B A BA,

initialement sous projet du projet-pilote SIRIUS (J. Le Bihan, puis W. Litwin)

» Dr. Thesis of MM. Abdellatif, Nicolas, Zeroual

Compiler implemented at Houston University– Team od Prof. M. Rusinkiewicz, 1990-1993

5

MSQL(Litwin, Abdellatif, Nicolas, Zeroual, 1989

L. Suardi, M. Rusinkiewicz, 1992)

Research vehicle for functions for the MBD environment – to address relations in different databases– to manipulate semantically heterogeneous data– to create MDB views– to transfer data (and schemas) between DBs – to define MDB dependencies

Present to limited extent in most of commercial DBMSs &DAMSs

6

MSQL(Basic new properties)

SQL Query– Uses 1st order predicate calculus

– Is compiled for optimization into the relational algebra

– Result is a table

MSQL Query– May use higher-order predicate calculus

– Is compiled for optimization into the multirelational algebra

– Result is a multitable» A set of relations (tables)

» May be constituted from one or no tables

7

MSQL(More on functions specific to MDB env.)

Addressing of tables in different DBs– Implicitly or by qualification by (multi)database

names» Introduced around 1985 by relational multibase system

prototype MRDSM – B A BA project at INRIA

» Unknown at that time of any relational language See the overview of relational DBMSs existing in 1987 (M.

Brodie)

8

MSQL(More on functions specific to MDB env.)

Manipulation of semantically heterogeneousdata

– Multiple Queries » With multiples identifiers» With semantic variables

Ranging over data names

– Scale and Precision – Units of measure– Implicit joins

Capabilities still unknown of SQL Capabilities known at present to some dialects

– Limited with respect to MSQL

9

MSQL : example

SIL SIL SIL

SGCIC BNP

View

SIL = Internal Logical Schema

View

10

Conceptual Schemas (the multischema)

DB bnp : br (br#, brname, street, street#, city, zipcode, tel)

account (acc#, cl#, balance, br#)client (cl#, clname, cltel, cltype, street, street#, city, zipcode)spe-acc (acc#, br#, cl#, balance, curr)

DB sg : branch (bra#, braname, street, s#, town, zip, t#, class)

acc (acc#, bra#, c#, balance)client (c#, cname, ct#, ctype, street, s#, town, zip)

DB cic : br (br#, brname, street, street#, city, zipcode, tel)

account (ac#, br#, cl#, balance, open_date)client(cl#, clname, cltel, cltype, street, street#, city, zipcode)

11

Semantic Heterogeneity In Banks

Same names can designate different data Different names can designate same data

– same client, same town..

The value of a primary key is valid only in one DB– how to identify same client in diff. banks ?

12

MSQL Commands

CREATE TABLE CREATE DATABASE CREATE MULTIDATABASE CREATE VIEW ALTER TABLE ALTER VIEW ALTER MULTIDATABASE DROP TABLE DROP DATABASE DROP MULTIDATABASE DROP VIEW

13

MSQL CREATE DATABASE

Query scope

> MSQLCREATE DATABASE boulogne ;

CREATE DB |.com.org.user.boulogne ;CREATE MULTIDATABASE Banks (bnp cic sg );

USE Banks;CREATE DATABASE boulogne FROM bnp ;

14

MSQL CREATE MULTIDATABASE

MSQL CREATE MDB EC-Banks (f-banks-i-banks, s-banks, g-

banks, e-banks );

CREATE MULTIDATABASE can create :

– flat MDBs (only contain DBs)

– nested MDBs (DBs or MDBs)

» can be potentially any network of DBs or MDBs like through the links on the WEB what about cycles ?

15

MSQLCREATE TABLE

use banks ;CREATE TABLE boulogne.loan FROM bnp.loan ;

CREATE TABLE fake_checks (Chq# INT, Montant_Euro CURRENCY [EURO] .... );

One has created four (empty) tables :bnp. fake_checks , cic. fake_checks ... boulogne.fake_checks

CREATE TABLE boulogne.client (c#, cn, ct#) FROM bnp.client (cl#, clname, cltel) PRIMARY KEY (c#) (cn, ct#) OUTER REFERENCES (clname, cltel);

Unit ofmesure

Import

16

MSQLCREATE TABLE with References

USE AuPrintemps /* MDB AuPrintemps

CREATE TABLE MusicDep.Inventory ….FOREIGN KEY (Item#) REFERENCES Central.Stock(I#);

No unauthorized Item# in the inventory of the Music Department

Other options PRIMARY KEY (…) REFERENCES T(…) ; [T1(A)] [LEFT|RIGHT] REFERENCES T2(B) ;

– Generates implicit equijoin, or left or right implicit outerjoins when a query selects attributes A and B.

17

MSQLALTER MULTIDATABASE

use banks ;

alter banksinclude vernesremove cic

Alter MDB can create

– flat MDBs (only contain DBs)

– nested MDBs

18

MSQL Elementary queries

USE bnp cicSELECT bnp.br.brname, cic.br.braname, bnp.br.street

FROM bnp.br, cic.brWHERE bnp.br.street = cic.br.street ;

bnp.br.brname cic.br.braname bnp.br.street

vaugirard 3 bd. montparnasse vaugirard

Prefixing with DB names was unknown to SQL- and is in DB2 SQL since last year only

19

MSQL Default DB

USE bnpSELECT br.brname, cic.br.brname, br.street

FROM br, cic.brWHERE br.street = cic.br.street ;

br.name cic.br.brname br.street

vaugirard 3 bd. montparnasse vaugirard

Tables of the default database are not prefixedTables of the default database are not prefixed

20

MSQL Elementary queries without prefixed names

USE bnp sgSELECT br.brname, branch.braname, br.street

FROM br, branchWHERE br.street = branch.street ;

br.name branch.braname br.street

vaugirard 3 bd. montparnasse vaugirard

Table names are unique within the query scope

21

Updates

USE (bnp b) sg ;

UPDATE account

SET account.balance = account.balance + 500

WHERE account.balance > acc.balance

AND b.client.clname = sg.client.cname AND b.client.street = sg.client.street ;

What does it mean ?

22

Multiples QueriesUSE BanksSELECT *

FROM br%WHERE street = 'champs elysées' ;

23

Multiple QueriesUSE Banks SELECT * FROM br% WHERE street = 'champs elysées' ; USE bnp SELECT * FROM br WHERE street = 'champs elysées' ; USE sg SELECT * FROM branch WHERE street = 'champs elysées' ; USE cic SELECT * FROM br WHERE street = 'champs elysées' ;

24

Results (a multitable)bnp.br

br# brname street street# city zipcode tel123 sembat champs elysées 130 Boulogne 92100 12 34456 sevres champs elysées 120 Sevres 92105 12 56

cic.br

bra# braname street st# town zip t# class123 jaures champs elysées 153 Boulogne 92100 3214 A765 sevres champs elysées 20 Sevres 92105 1243 B

sg.branch

br# brname street s# city zipcode telabc sembat champs elysées 110 Boulogne 92100 12.45a1f gare champs elysées 30 Chaville 92110 34.56

25

Multiple UpdatesBegin

Use BanksUpdate cl*set street = 'Charles de Gaulle"where street = 'Etoile'

If SQLCODE <> 0 then Rollback ;

Commit

Use Banks vital cicUpdate cl*set street = 'Charles de Gaulle"where street = 'Etoile'

MSQL transaction semantics is more general than ACID– may include COMP (compensation) statement, list of accept.

states....

26

Semantic Variables in MSQL

use bnp sglet x be town cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'

27

Semantic Variables in MSQLuse bnp sg

let x be town cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'

use bnpselect * from brwhere town = 'Paris' and street = 'r. de Rivoli'

use sgselect * from branchwhere city = 'Paris' and street = 'r. de Rivoli'

28

Semantic Variables in MSQL

use bnp sglet x be town cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'

Alternatively:

use bnp sglet x be to% cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'

29

Semantic Variables in MSQL

use banks let X be banks.*select a%, balance, c%namefrom X.a% a, X.c% c

where a. a% = c. c%

The query illustrates the multitable pair-wise join Semantic variable a over relation name account is not

necessary, but simplifies the typing of the query

33

Semantic Variables in MSQL

Semantic variables can be compound and with values selected by queries from some dictionaries

use bankslet (x, y) be :select X.attr Y.attr

from FD X, FD Y where X.mean = tel and Y.mean = city

select * from clientwhere x = '123' and y = 'Paris'

mean attrtel t#tel telcity citycity towncity burgh

FD

FD

banks

34

Semantic Variables in MSQL

Can be applied to MSQL DD statements

use bankscreate database cic2 ;

let x be a% b% c%create table cic2.x from cic.x ;

Copies cic schema except for one table

35

Name homogenizationThe labels

USE Banks ;LET t BE tel t#SELECT %name branch_name, t tel#, s%# street#FROM br% brWHERE street = ‘Champs Elysées’ ;

The result : multitable:

{( bnp.br.branch_name, bnp.br.tel#, bnp.br.street# ), ( sg.br.branch_name, sg.br.tel#, sg.br.street# )( cic.br. branch_name, cic.br.tel#, cic.br.street# )}

36

Multidatabase Views

USE my_bank bnp sg ;CREATE VIEW my_bank.same_street_branches

(bnp_name, bnp_s#, sg_name, sg_s#, street, city)AS SELECT brname, street#, braname, s#, street, city FROM bnp.br b, sg.branch s

WHERE b.street = s.street AND b.city = s.town ;

A partial view of DBs bnp and sg in DB my_bank

my_bankbnp

sg

The views in my_bankcan be considered Import Schemes

37

Multidatabase Union Views

Use BanksCreate View bnp.all-banks as

Use bankslet x be town citylet y be banks.*Select y.br% ( y, br#, br%name branch, street, street#, x city, zip% zip, t% tel)Union *

Union * unions all the tables of the selected multitable It scales to all the tables named br% of Banks, if new banks enters the MDB Banks in the futureCurrent DBMS, e.g., SQL Server, require to alter the union view definition in such a case

38

Key words and Aggregate Functions in MSQL

Key words and Aggregate Functions of SQL– par definition

» DISTINCT, GROUP BY, ORDER BY» COUNT, AVG, SUM…

– operate at each table of a multitable Their extensions to multitables

» MDISTINCT, MCOUNT, MGROUP BY, MORDER BYMAVG, MSUM...

– operate at whole multitable– important for warehousing

39

Example

USE BanksSELECT COUNT (*)FROM br% brWHERE street = 'champs elysées' ;

40

Example

USE BanksSELECT COUNT (*)FROM br% brWHERE street = 'champs elysées' ;

bnp.br2

cic.br2

sg.br2

41

Example

USE BanksSELECT MCOUNT (*)FROM br% brWHERE street = 'champs elysées' ;

42

ExampleUSE Banks

SELECT MCOUNT (*)FROM br% brWHERE street = 'champs elysées' ;

br6

Exercises in warehousing :

-Average balance per client in each bank

Average balance per client in BANKS

-Sum of client assets per bank

-Sum of client assets in BANKS

Exercises in warehousing :

-Average balance per client in each bank

Average balance per client in BANKS

-Sum of client assets per bank

-Sum of client assets in BANKS

43

Aggregate Functions IMPLEMENTATION ISSUES

All-in-one (traditional computation)– Possibly in parallel– The calculus can take long time.

Successive approximations– Some kind of sampling

» result1, from any 1st DB to come» (result1 + result2) / 2» …» sampling within each database

several ACM-Sigmod & VLDB papers dealt with query evaluation using sampling

Precomputing– Incremental evaluation using interdatabase dependencies– Common to warehousing

44

Aggregate Functions MERGE ON

form a single tuple from all the tuples of the same objet in the multitable– Uses outer jointures

Find millionaires in Banks and form the tuple for each millionaireUSE Banks ;

LET x.y BE clname.cltel cname.ct#LET z BE Banks.*SELECT *FROM z.a% WHERE z.a%.c%# = z.client.c%#AND z.a%.balance > 1 000 000MERGE ON x y ;

45

Aggregate Functions MERGE ON

USE Banks ;LET x.y BE clname.cltel cname.ct#LET z BE Banks.*SELECT *FROM z.a% WHERE z.a%.c%# = z.client.c%#AND z.a%.balance > 1 000 000MERGE ON x y ;

nulls

nullsnulls

46

Aggregate Functions NAME

Transform a name (table, attribute..) into attribute value

USE Banks ;LET x.y BE br.city branch.townSELECT %name branch_name, NAME (.x) bankFROM xWHERE y = 'Nice' UNION * ;

Note: Union * unions all the tables of the selected multitable

the result is the table :branch_name bank

Jaures CIC

DeGaulle BNP

47

Aggregate Functions CHOOSE

Chooses at most n tuples among the selected ones– the 1st found as does the function TOP (default) in any or some order,

specified by ORDER BY (default)

– strictly random (RND)

– these that were not chosen by the previous execution of the query in the same transaction (NEW)

– preferably in the DBs listed, and in the listed order

– at most j per DB

– selecting at most m tuples sharing the values of the attr. in the list A, supposed global key of some objet.

CHOOSE (n, (m, <A>), [<B>] | j, [<B>], [RND | NEW]

<A> ::= <list of attr.> <B> ::= <list of DBs>

48

Aggregate Functions CHOOSE

Choose a millionaire randomly

USE Banks ;SELECT c.*FROM c% c, a% aWHERE c.c%# = a.c%# AND a.ba% > 1.000.000CHOOSE (1) RND ;

Function very important in MBD environment – information overload

49

Aggregate Functions TIMEOUT

Fix time limit of a query– the system should possibly deliver all the relevant tuples – however, any query arriving to timeout is considered executed successfully

TIMEOUT (t [unit]) ;<unit> := ms | s | m | h | ds - seconds (default)

USE BanksSELECT *FROM br%WHERE street = 'champs elysées' TIMEOUT (10) ;

50

Aggregate Functions POST

Make a query continuous– One manipulates each tuple found during the life time of the

query

– Even those created after the query start

– TIMEOUT may be used to limit the life time

USE Immo LaCentrale Orpi ;SELECT *FROM logem%WHERE prix < 1,000,000 AND Ville = 'Paris' POST ;

51

Aggregate Functions ESTIMATE

Compute the cost of a query before the execution and can start the execution after an authorizationESTIMATE (type, price, time, count, size, report)

[WITH EXEC_PROMPT]

type of estimate : – exact (can be long to compute)– approximate

price of the query (in $, FF...). completion time number of tuples size of the resultant, in bytes report on the estimate itself

– precision...

52

Privileges in MSQL

USE bnp sg cic ;

GRANT SELECT ON client TO Nicolas Abdellatif ;

client is a multitable :

client = (bnp.client, sg.client, cic.client)

GRANT ALL ON etoile.account TO Nicolas Abdellatif FROM bnp.account ;

GRANT ALL ON etoile.account TO Nicolas FROM Zeroual ON bnp.account ;

53

Interdatabase Queries

Transfer data between DBs Source and target are multitables

INSERT...

54

Interdatabase Queries INSERT

– insert selected tuples» except these with the key already in the target

STORE– insert selected tuples

» replacing these with the key already in the target

REPLACE– insert selected tuples and delete the rest of the target

UPDATE– update the tuples selected in the target with the values in the

source tables COPY

– copies tuples and the source schema

55

MSQL There are more interesting capabilities

– e.g. Multidatabase Dependencies» referential integrity & (outer) join links» multidatabase triggers

local autonomy

» dynamic attributes for retrieval and updates

The language design will never be finished– MSQL 1, 2, 3...

MSQL : A multidatabase Language. Information Science Journal : Special Issue on Database Systems, 48, 2, (July 1989).

Execution of Extended Multidatabase SQL. Intl. IEEE Conf. on Data Eng. Vienna, 1992

56

O*SQL For OO or RO common model, consider in

addition:– MDB inheritance– MDB type/subtype integration

» derived types

– OID heterogeneity & UUIDs– Type / function value semantic heterogeneity

» dynamic type hierarchies» higher-order OO languages

Relations with inherited attributes

57

Elements of MSQL in commercial DBMSs

Main DBMSs evolved to MDBSs– yet primitive but it's better than nothing

Sybase, Oracle, Informix, MsAccess, SQL Server,....

There are also MDBSs which are only access systems to DBMSs EDA-SQL, DEC DB Integrator, DBJoiner (IBM),

Ingres*, UniSQL/M, Uniface, Q+E, OAdaptor (HP), Telebase...

"Data Warehouses"

58

MSQL in commercial DBMSs(Department Store Data Warehouse, using MsAccess, SQL Server...)

MusicDep

JeansDep

Home Appl.Dep

FoodDep

Bd. Haussman

JeansDep

Home Appl.Dep

FoodDep

Parly 2

BooksDep

Payroll

Orders

Au Printemps

CentralWarehouse

MusicDep

Car

59

MBD Manipulations in MsAccess

One can perform limited MBD operations between – MsAccess DBs – An DB of MsAccess and

» any other DB under a DBMS ODBC compatible

» Paradox, Btrieve, Dbase

» Any OLE compatible program Excel...

60

MBD Manipulations in MsAccess

MsAccess

Attach

Paradox Excel Oracle Sybase

B1

ParadoxGateway

ODBC

Sybase ODBCdriver

ODBC

Import

ExportDistr.Connect.

B3

B2

Insert INTO

61

MsAccess & MSQL

Open B <=> USE B ATTACH table

Open B1 attach B2.T' as T create view B1.T as select * from B2.T'

» DROP VIEW corresponds to Delete in MsAccess menu

Clause IN <externalDB>Open B1

Select a, b, c From D IN B2 select a, b c from B2.D

62

Examples MsAccess Source DB: MsAccess

SELECT [Customer ID]FROM Customers IN MYDATA.MDBWHERE [Customer ID] Like "A*";

Source DB: Paradox SELECT [CustomerID]FROM CustomersIN "C:\PARADOX\DATA\SALES" "Paradox 4.x;"WHERE CustomerID Like "A*";

Every data transfer from/to DB non-MsAccess or OLE compatible software has data repr. conversions– Semantic Heterogeneity oblige

63

Elementary Queries in MS-Access

Open a DB and query other DBs

– one has to define aliases in FROM DB open here is called s-p1.mdb

– but this name has no importance here

Joins of tables in other databases

SELECT TOP 10 C.[Contact Name], C.City

FROM [c:\access\nwind2.mdb].Customers AS C, [c:\access\ordentr2.mdb].customers AS O

WHERE (o.Id= C.[customer Id]);

64

Result

Contact Name CityPat Parkes LondonGladys Lindsay SeattleElizabeth Lincoln TsawassenOlivia LaMont San FranciscoTerry Hargreaves LondonElizabeth Brown LondonSylvia Dunn LondonAnn Devon LondonRonald Merrick LondonBill Lee Pocatello

65

Join of a local and external table

SELECT TOP 10 S.SName, C.[Contact Name], C.City

FROM S, [nwind2.mdb].Customers AS C

WHERE ((S.City= C.City))

Order by [contact name];

Elementary Queries in MS-Access

66

SName Contact Name CityClark Ann Devon LondonClark Archibald Langford LondonClark Cornelia Giles LondonClark David Bird LondonClark Elizabeth Brown LondonClark G.K.Chattergee LondonClark Gerald Pipps LondonClark Hari Kumar LondonClark Jane Austen LondonClark Jeffrey Jefferies London

Result

67

MsAccess & MSQL

Clause INTO <externalDB> dans Select INTO ou INSERT INTOOpen B1

Select a, b, c INTO T IN B2 From D Use B1 ;copy into B2.T

select a, b c from D ;» D can be a view or a subquery» One cannot combine clauses IN et INTO

– INSERT de MsAccess has (sub)semantics of INSERT in MSQL

68

MsAccess & MSQL

IMPORT & EXPORT– menu commands – equivalent to MSQL query

Use B1 ;copy into T1 * from B2.T2 ;

69

MsAccess & MSQLComparison

Formulation of MBD elementary queries and views– first one has to define ATTACH's– then one formulates SQL monodatabase query– then, perhaps one needs to delete the ATTACH's

Much more procedurality than under MSQL– in Banks, one would need in practice that each DB attaches

all the tables of any other DB» Good luck DBA !

Multiple queries and other capabilities of MSQL» yet unknown of MsAccess

70

SQL Server, Sybase, Interbase MBD Architecture similar to that of MsAccess, but more powerful

– gateways to Oracle, IMS, DB2– ODBC

Transac-SQL support the following MSQL functions and is the MBD dialect least procedural in the industry– elementary queries

» to Sybase DBs at the same siteUSE B ;select * from T where B1.T1.a = T.a ;

» Only one DB per USE » some restrictions at the level of interdatabase queries

– multidatabase CREATE VIEW, and MDB triggers

71

Oracle, RDB, Informix Have an operation similar to ATTACH called

Create link:

Create public database link bnp connect to bnp_unix

Create public database link cic connect to cic_vms

SELECT br.brname, b.braname, br.streetFROM br @bnp, br@ cic bWHERE br.street = b.street ;

72

Oracle, RDB, Informix

MBD queries are possible only once the links are defined– Hence these DBMSs are + procedural than Sybase

for MBD operations

Starting from V7, Oracle supports however MBDs queries without links

» postfixing par the DB name as in the last ex.

73

EDA-SQL, DB Integrator, DBJoiner, Ingres* & al SQL MDBSs for access to DBMSs

– in theory, without their own DBs

» but there is always one for the MDB catalogs

auxiliary DB One has to create links and logical DBs

» almost virtual DBs only DB Integrator supports elem.

MDB queries – called multischema queries

No other MSQL functions

logical DB

logical DB

BD lMS

BD RDB

BD Ingres

Gateway

OD

BC

74

UniSQL & O-Adaptor

Similar to previous ones except that for the logical DB – UniSQL uses RO model

– O-Adaptor an OO model

No MDB queries (other than link creations)

logical DB

logical DB

BD lMS

BD RDB

BD Ingres

Gateway

OD

BC

75

Telebase (USA) MDBMS for access to inf. retr. DBs

– 1000+ DBs at many sites» with different local languages

STAIRS, INSPEC, DIALOG...

Extended Common Command Set Language (CCS)

No joins only Boolean clauses Supports the MSQL functions

– multidatabases names » Called Categories

– multiple queries » Called Scans DBs

INSPECDBs

STAIRS

DBsDIALOG

Drivers

CCS

76

Messidor : 1st Heterogeneous Multidatabase Information Retrieval Access System

Démonstration par C. Moulinoux (STERIA), INRIA, 1987

77

Meta-search engines

Metacrawler, BigHub.com, AskJeewes.com, Copernic…

Query simultaneously several search engines – Altavista, Yahoo, Excite, Hotbot…

Boolean Manipulation Langages Multiples Queries

– Apply the mdb aggregate functions Mdistinct Name, Mdistinct, Choose, Timeout…

78

Data Warehouses Popular new concept for MDBSs

– data warehouse <=> an MDB or federation in an enterprise

– With elaborated management of interdatabase dependencies

– new ideas:» elaborated DS implementation

elaborated decision support functions

incremental propagation

» an MDB view or a DB redundant with respect to existent ones is created

Data warehouse

DB lMS

DB RDB

DB Ingres

Gateway

OD

BC

Data mart

79

International Journal of Cooperative Information Systems

Special Issue on Design and Management of Data Warehouses

Guest editors: Manfred A. Jeusfeld and Martin Staudt

Data Warehousing embraces technology and industrial practice to systematically collect data from the enterprise and to use that data in ahighly aggregated form for managing the enterprise thru decisions. Littleattention is currently paid to design and manage a data warehouse (DW) insuch a way that it accomplishes its purpose, i.e. to support the managementof the enterprise. Existing solutions are focusing on technical aspectslike efficient source data extraction. Their parameters are howeverincomprehensible to the stakeholders who decide on the introduction of adata warehouse.

Data warehouses are important in managing large enterprises and incommunicating highly aggregated information between the variousdepartments. Interoperable tools and integrated methods to manage datawarehouses in order to fulfill the enterprise goals are desperately needed.Such tools should cover all aspects of data warehousing:

- selection of data sources- data cleaning- conceptual/logical/physical data warehouse design- enterprise modeling- data warehouse quality monitoring- data warehouse refreshment methods- architecture design- data mart customization, etc.

An Instructive Call for Papers

80

Exemple : Architecture de DB2 Data Warehouse

81

Conclusion MDB Manipulations - among most important R & D directions Other key-words:

– Interoperability

– Integration

– Distributed Heterogeneous DBs

– Data Warehouses MSQL is a research vehicle + advanced for relational MBDs The root for further resarch proposals

– MSQL with Integrity Constraints, IDL, SchemaSQL…

» For the latter, see especially ACM-TODS journal, Dec. 2001 Basic MSQL capabilities are in commercials DBMSs, Information Retrieval Ssytems, MDB

Access Systems, Data Warehouses, XML standard proposals…

– Others will follow But there is still a lot to do

– in the industry and in research

82

Exercises and Research Problems All these in the text ; mdb queries especially. Express various elementary mdb queries using Amos, MsAccess (SQL and QBE), SQL Server,

DB2, Oracle, Interbase… Invent your own instructive queries to BANKS Under MsAccess, design an MDB Form for an elementary query and for a multiple query.

Explain how you did i in a short report. Consider 3 attributes B1.T1.a, B2.T2.b, B3.T3.c. The attribute types are INT and unit of

measures are KG, G, mG. Consider that '=' operator has the usual mathematical meaning, with the usual rounding up of values with a different precision. Prove or disprove that the usual associativity of equijoins

(a JOIN b) JOIN c = a JOIN (b JOIN c)does not hold anymore. Comment on the consequences for the current relational query optimizers.

If you had to evaluate a JOIN b using manual unit conversion, would you rather convert a to b or vice versa ?

Propose and justify a reasonable algorithm for the multiple join evaluation. A unit conversion algorithm A may be a long calculus. Would you rather:

– apply A to every value V of a the manipulated table– project or order the table first, then apply A once for every different value

83

Exercises and Research Problems Propose an execution tree expressed in mdb algebra for the query of slide 30 Add unit conversion to your favorite query optimizer (Ph. D Thesis) Try to express the example queries to Banks using SchemaSQL language. For each query,

present also the result. Can it be a multirelation ? Try to express the example queries to Banks using IDL language. For each query, present also

the result. Can it be a multirelation ? Consider that bnp.balance is in US$ and cic.balance is in FF. Consider that the exchange rate is

in some table ExRa in DB called Currency. Is it possible to find accounts with the same balance in both DBs using a single MSQL query ?

Consider that to perform a multidatabase join A JON B one has to bring both tables into a database. What are your options for an elementary MDB query processing, if there are selections, joins, and projections ?

Idem, if you consider distributed join processing ? Consider a multitable R = (R1, R2, R3), a table T and the query Q

select * from R where R.a = T.a and T.b = '123' ;

What are your option for Q's optimization ?

Propose an implementation for your favorite MSQL aggregate function (Ph. D. Thesis or a part of it at least)

END