Grammar Induction: Techniques and Theory

199
1 ICML 2006, Grammatical Inference 1 Colin de la Higuera, Université de Saint-Etienne Tim Oates, University of Maryland Grammar Induction: Techniques and Theory

Transcript of Grammar Induction: Techniques and Theory

Page 1: Grammar Induction: Techniques and Theory

1ICML 2006, Grammatical Inference1

Colin de la Higuera, Université de Saint-EtienneTim Oates, University of Maryland

Grammar Induction: Techniques and Theory

Page 2: Grammar Induction: Techniques and Theory

2ICML 2006, Grammatical Inference2

Acknowledgements• Laurent Miclet, Tim Oates, Jose Oncina, Rafael Carrasco, PacoCasacuberta, Pedro Cruz, RémiEyraud, Philippe Ezequel, Henning Fernau, Jean-Christophe Janodet, Thierry Murgue, Frédéric Tantini, Franck Thollard, Enrique Vidal,...

• … and a lot of other people to whom we are grateful

Page 3: Grammar Induction: Techniques and Theory

3ICML 2006, Grammatical Inference3

Outline

1 An introductory example

2 About grammatical inference

3 Some specificities of the task

4 Some techniques and algorithms

5 Open issues and questions

Page 4: Grammar Induction: Techniques and Theory

4ICML 2006, Grammatical Inference4

1 How do we learn languages?

A very simple example

Carmel and Markovitch 98 & 99http://www.cs.technion.ac.il/~carmel/papers.html

Page 5: Grammar Induction: Techniques and Theory

5ICML 2006, Grammatical Inference5

The problem:

• An agent must take cooperative decisions in a multi-agent world.

• His decisions will depend:

– on the actions of other agents;

– on what he hopes to win or lose.

Page 6: Grammar Induction: Techniques and Theory

6ICML 2006, Grammatical Inference6

Hypothesis: the opponent follows a rational strategy (given by a DFA/Moore machine):

e el l

pp

d

p e

e e p e p → le e e → d

You: listenor doze

Me: equations or pictures

Page 7: Grammar Induction: Techniques and Theory

7ICML 2006, Grammatical Inference7

Example: (the prisoner’s dilemma)

• Each prisoner can admit (a) or stay silent (s)

• If both admit: 3 years each;

• If A admits but not B, A=0 years, B=5 years;

• If B admits but not A, B=0 years, A=5 years;

• If neither admits: 1 year each.

Page 8: Grammar Induction: Techniques and Theory

8ICML 2006, Grammatical Inference8

a

a

s

s

-3

-3

0

-5

0

-5

-1

-1

AB

Page 9: Grammar Induction: Techniques and Theory

9ICML 2006, Grammatical Inference9

• Here an iterated version against an opponent that follows a rational strategy.

• Gain Function: limit of means.

• A game is a word in

(His_moves × My_moves)*!

Page 10: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference10

The general problem

• We suppose that the strategy of the opponent is given by a deterministic finite automaton.

• Can we imagine an optimal strategy?

Page 11: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference11

Suppose we know the opponent’s strategy:

• Then (game theory):

• Consider the opponent’s graph in which we value the edges by our own gain.

Page 12: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference12

a s

a

s

-3 0

-5 -1

s s

aa

a s s

a s-3

-5 -1

0

-1

0

Page 13: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference13

1 Find the cycle of maximum mean weight.

2 Find the best path leading to this cycle of maximum mean weight.

3 Follow the path and stay in the cycle.

All that is needed is to find the opponent’s automaton!

Then

Page 14: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference14

a s

a

s

-3 0

-5 -1

s s

aa

a s s

a s

Mean= -0.5

Best path

-3

-5 -1

0

-1

0

Page 15: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference15

Question

• Having seen a game of this opponent…

• Can we reconstruct his strategy ?

Page 16: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference16

Data (him, me) : {aa as sa aa as ssss ss sa}

HIM MEa aa ss aa aa ss ss ss as a

I play asa, his move is a

Page 17: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference17

λ→ a

a→a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

a

Page 18: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference18

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

a

a

Page 19: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference19

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

a

a

Page 20: Grammar Induction: Techniques and Theory

20ICML 2006, Grammatical Inference20

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

a

a, s

Page 21: Grammar Induction: Techniques and Theory

21ICML 2006, Grammatical Inference21

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

sa

a

s

Page 22: Grammar Induction: Techniques and Theory

22ICML 2006, Grammatical Inference22

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

sa

a

s

a

Page 23: Grammar Induction: Techniques and Theory

23ICML 2006, Grammatical Inference23

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

sa

a

s

a,s

Page 24: Grammar Induction: Techniques and Theory

24ICML 2006, Grammatical Inference24

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

sa

a

s

a

s

Page 25: Grammar Induction: Techniques and Theory

25ICML 2006, Grammatical Inference25

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

ssa

a

s

a

s

Page 26: Grammar Induction: Techniques and Theory

26ICML 2006, Grammatical Inference26

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

ssa

a

s

a

s

s

Page 27: Grammar Induction: Techniques and Theory

27ICML 2006, Grammatical Inference27

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

s

ssa

a

s

a

s

Page 28: Grammar Induction: Techniques and Theory

28ICML 2006, Grammatical Inference28

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

s

ssa

a

s

a

s

Page 29: Grammar Induction: Techniques and Theory

29ICML 2006, Grammatical Inference29

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

s

ssa

a

s

a

s

a

Page 30: Grammar Induction: Techniques and Theory

30ICML 2006, Grammatical Inference30

λ → a

a → a

as → s

asa → a

asaa → a

asaas → s

asaass → s

asaasss → s

asaasssa → s

s

ssa

a

s

a

s

a

Page 31: Grammar Induction: Techniques and Theory

31ICML 2006, Grammatical Inference31

s

ssa

a

s

a

s

a

Page 32: Grammar Induction: Techniques and Theory

32ICML 2006, Grammatical Inference32

How do we get hold of the learning data?

a) through observation

b) through exploration

Page 33: Grammar Induction: Techniques and Theory

33ICML 2006, Grammatical Inference33

An open problemThe strategy is probabilistic:

s

a:70%S:30%

a:50%S:50%

a:20%S:80%

a

s

a

s

a

Page 34: Grammar Induction: Techniques and Theory

34ICML 2006, Grammatical Inference34

Tit for Tat

sa

a

s

a

s

Page 35: Grammar Induction: Techniques and Theory

35ICML 2006, Grammatical Inference35

2 Specificities of grammatical inference

Grammatical inference consists (roughly) in finding the (a) grammar or automaton that has produced a given set of strings (sequences, trees, terms, graphs).

Page 36: Grammar Induction: Techniques and Theory

36ICML 2006, Grammatical Inference36

The goal/idea

• Old Greeks:

A whole is more than the sum of all parts

• Gestalt theory

A whole is different than the sum of all parts

Page 37: Grammar Induction: Techniques and Theory

37ICML 2006, Grammatical Inference37

Better said

• There are cases where the data cannot be analyzed by considering it in bits

• There are cases where intelligibility of the pattern is important

Page 38: Grammar Induction: Techniques and Theory

38ICML 2006, Grammatical Inference38

Nothing Lots

What do people know about formal language theory?

Page 39: Grammar Induction: Techniques and Theory

39ICML 2006, Grammatical Inference39

A small reminder on formal language theory

• Chomsky hierarchy

• + and – of grammars

Page 40: Grammar Induction: Techniques and Theory

40ICML 2006, Grammatical Inference40

A crash course in Formal language theory

• Symbols

• Strings

• Languages

• Chomsky hierarchy

• Stochastic languages

Page 41: Grammar Induction: Techniques and Theory

41ICML 2006, Grammatical Inference41

Symbols

are taken from some alphabet Σ

Stringsare sequences of symbols from Σ

Page 42: Grammar Induction: Techniques and Theory

42ICML 2006, Grammatical Inference42

Languages

are sets of strings over Σ

Languagesare subsets of Σ*

Page 43: Grammar Induction: Techniques and Theory

43ICML 2006, Grammatical Inference43

Special languages

• Are recognised by finite state automata

• Are generated by grammars

Page 44: Grammar Induction: Techniques and Theory

44ICML 2006, Grammatical Inference44

a

b

a

b

a

b

DFA: Deterministic Finite State Automaton

Page 45: Grammar Induction: Techniques and Theory

45ICML 2006, Grammatical Inference45

a

b

a

b

a

b

abab∈L

Page 46: Grammar Induction: Techniques and Theory

46ICML 2006, Grammatical Inference46

What is a context free grammar?A 4-tuple (Σ, S, V, P) such that:

– Σ is the alphabet;

– V is a finite set of non terminals;

– S is the start symbol;

– P ∈ V × (V∪Σ)* is a finite set of rules.

Page 47: Grammar Induction: Techniques and Theory

47ICML 2006, Grammatical Inference47

Example of a grammarThe Dyck1 grammar

– (Σ, S, V, P)

– Σ = {a, b}

– V = {S}

– P = {S → aSbS, S → λ }

Page 48: Grammar Induction: Techniques and Theory

48ICML 2006, Grammatical Inference48

Derivations and derivation trees

S → aSbS

→ aaSbSbS

→ aabSbS

→ aabbS

→ aabb

a

a

b

b

S

SS

S

S

λ

λ

λ

Page 49: Grammar Induction: Techniques and Theory

49ICML 2006, Grammatical Inference49

Chomsky Hierarchy

• Level 0: no restriction

• Level 1: context-sensitive

• Level 2: context-free

• Level 3: regular

Page 50: Grammar Induction: Techniques and Theory

50ICML 2006, Grammatical Inference50

Chomsky Hierarchy• Level 0: Whatever Turing machines can do

• Level 1: – {anbncn: n∈ }– {anbmcndm : n,m∈ }– {uu: u∈Σ*}

• Level 2: context-free– {anbn: n∈ }– brackets

• Level 3: regular– Regular expressions (GREP)

Page 51: Grammar Induction: Techniques and Theory

51ICML 2006, Grammatical Inference51

The membership problem

• Level 0: undecidable

• Level 1: decidable

• Level 2: polynomial

• Level 3: linear

Page 52: Grammar Induction: Techniques and Theory

52ICML 2006, Grammatical Inference52

The equivalence problem

• Level 0: undecidable

• Level 1: undecidable

• Level 2: undecidable

• Level 3: Polynomial only when the representation is DFA.

Page 53: Grammar Induction: Techniques and Theory

53ICML 2006, Grammatical Inference53

41

31

21

21

21

32

b

b

a

a

a

b

43

21

PFA: Probabilistic Finite (state) Automaton

Page 54: Grammar Induction: Techniques and Theory

54ICML 2006, Grammatical Inference54

0.1

0.3

a

b

a

b

a

b

0.65

0.9

0.7

0.3

0.7

0.35

DPFA: Deterministic Probabilistic Finite (state) Automaton

Page 55: Grammar Induction: Techniques and Theory

55ICML 2006, Grammatical Inference55

What is nice with grammars?

• Compact representation

• Recursivity

• Says how a string belongs, not just if it belongs

• Graphical representations (automata, parse trees)

Page 56: Grammar Induction: Techniques and Theory

56ICML 2006, Grammatical Inference56

What is not so nice with grammars?

• Even the easiest class (level 3) contains SAT, Boolean functions, parity functions…

• Noise is very harmful:

– Think about putting edit noise to language {w: |w|a=0[2]∧|w|b=0[2]}

Page 57: Grammar Induction: Techniques and Theory

57ICML 2006, Grammatical Inference57

Inductive Inference Pattern Recognition

Grammatical Inference

The field

Machine Learning

Computational linguistics Computational biology Web technologies

Page 58: Grammar Induction: Techniques and Theory

58ICML 2006, Grammatical Inference58

The data

• Strings, trees, terms, graphs

• Structural objects

• Basically the same gap of information as in programming between tables/arrays and data structures

Page 59: Grammar Induction: Techniques and Theory

59ICML 2006, Grammatical Inference59

Alternatives to grammatical inference

• 2 steps:

– Extract features from the strings

– Use a very good method over Rn.

Page 60: Grammar Induction: Techniques and Theory

60ICML 2006, Grammatical Inference60

Examples of stringsA string in Gaelic and its translation to English:

• Tha thu cho duaichnidh ri èarràirde de a’ coisich deas damh

•You are as ugly as the north end of a southward traveling ox

Page 61: Grammar Induction: Techniques and Theory

61ICML 2006, Grammatical Inference61

Page 62: Grammar Induction: Techniques and Theory

62ICML 2006, Grammatical Inference62

Page 63: Grammar Induction: Techniques and Theory

63ICML 2006, Grammatical Inference63

>A BAC=41M14 LIBRARY=CITB_978_SKBAAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTAGTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAATGGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCAGAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGACACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCTGGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGGGCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCAGGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAACAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACACACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGATGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAATGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATAAAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCATCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAGTATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGAGGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGATGGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAAAAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCCTGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGACTAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCATGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT

Page 64: Grammar Induction: Techniques and Theory

64ICML 2006, Grammatical Inference64

����1

� � � � � � �� � � � � � � � � � � �

Page 65: Grammar Induction: Techniques and Theory

65ICML 2006, Grammatical Inference65

Page 66: Grammar Induction: Techniques and Theory

66ICML 2006, Grammatical Inference66

Page 67: Grammar Induction: Techniques and Theory

67ICML 2006, Grammatical Inference67

<book><part><chapter>

<sect1/><sect1><orderedlist numeration="arabic"><listitem/><f:fragbody/>

</orderedlist></sect1>

</chapter></part>

</book>

Page 68: Grammar Induction: Techniques and Theory

68ICML 2006, Grammatical Inference68

<?xml version="1.0"?><?xml-stylesheet href="carmen.xsl" type="text/xsl"?><?cocoon-process type="xslt"?><!DOCTYPE pagina [<!ELEMENT pagina (titulus?, poema)><!ELEMENT titulus (#PCDATA)><!ELEMENT auctor (praenomen, cognomen, nomen)><!ELEMENT praenomen (#PCDATA)><!ELEMENT nomen (#PCDATA)><!ELEMENT cognomen (#PCDATA)><!ELEMENT poema (versus+)><!ELEMENT versus (#PCDATA)>]><pagina><titulus>Catullus II</titulus><auctor><praenomen>Gaius</praenomen><nomen>Valerius</nomen><cognomen>Catullus</cognomen></auctor>

Page 69: Grammar Induction: Techniques and Theory

69ICML 2006, Grammatical Inference69

Page 70: Grammar Induction: Techniques and Theory

70ICML 2006, Grammatical Inference70

A logic program learned by GIFTcolor_blind(Arg1) :-

start(Arg1,X),p11(Arg1,X).

start(X,X).

p11(Arg1,P) :- mother(M,P),p4(Arg1, M).p4(Arg1,X) :-

woman(X),father(F,X),p3(Arg1,F).p4(Arg1,X) :-

woman(X),mother(M,X),p4(Arg1,M).p3(Arg1,X) :- man(X),color_blind(X).

Page 71: Grammar Induction: Techniques and Theory

71ICML 2006, Grammatical Inference71

3 Hardness of the task– One thing is to build algorithms, another is to be able to state that it works.

– Some questions:– Does this algorithm work?

– Do I have enough learning data?

– Do I need some extra bias?

– Is this algorithm better than the other?

– Is this problem easier than the other?

Page 72: Grammar Induction: Techniques and Theory

72ICML 2006, Grammatical Inference72

Alternatives to answer these questions:

– Use benchmarks

– Solve a real problem

– Prove things

Page 73: Grammar Induction: Techniques and Theory

73ICML 2006, Grammatical Inference73

Theory

• Because you may want to be able to say something more than « seems to work in practice ».

Page 74: Grammar Induction: Techniques and Theory

74ICML 2006, Grammatical Inference74

Convergence

• Does my algorithm converge in some sense to a best solution.

• To be able to answer, we have to admit the existence of a best solution.

Page 75: Grammar Induction: Techniques and Theory

75ICML 2006, Grammatical Inference75

Issues

• Get close to the best?

– Metrics

– Distributions over strings

• PAC related model and similar: very negative results

Page 76: Grammar Induction: Techniques and Theory

76ICML 2006, Grammatical Inference76

Identification in the limit

L Pres ⊆N→XA class of languages

A class of grammarsG

LThe naming functionA learner

yields

ϕ

f(N)=g(N) ⇒yields(f)=yields(g)L(ϕ(f))=yields(f)

Page 77: Grammar Induction: Techniques and Theory

77ICML 2006, Grammatical Inference77

f1 f2 fn

h1 h2 hn

fi

hi ≡ hn

L(hi)= L

L is identifiable in the limit in terms of Gfrom Pres iff

∀L∈L, ∀f∈Pres(L)

Page 78: Grammar Induction: Techniques and Theory

78ICML 2006, Grammatical Inference78

No quería componer otro Quijote —lo cual es fácil— sino el Quijote. Inútil agregar que no encaró nunca una transcripción mecánica del original; no se proponía copiarlo. Su admirable ambición era producir unas páginas que coincidieran -palabra por palabra y línea por línea-con las de Miguel de Cervantes.

[…]

“Mi empresa no es difícil, esencialmente” leo en otro lugar de la carta. “Me bastaría ser inmortal para llevarla a cabo.”

Jorge Luis Borges(1899–1986)Pierre Menard, autor del Quijote (El jardín de senderos que

se bifurcan) Ficciones

Page 79: Grammar Induction: Techniques and Theory

79ICML 2006, Grammatical Inference79

4 Algorithmic ideas

Page 80: Grammar Induction: Techniques and Theory

80ICML 2006, Grammatical Inference80

The space of GI problems

• Type of input (strings)

• Presentation of input (batch)

• Hypothesis space (subset of the regular grammars)

• Success criteria (identifi-cation in the limit)

Page 81: Grammar Induction: Techniques and Theory

81ICML 2006, Grammatical Inference81

Types of input

the cat hates the dogStrings:

StructuralExamples:

cat dog the the hates

(+)

(-)

Graphs:

Page 82: Grammar Induction: Techniques and Theory

82ICML 2006, Grammatical Inference82

Types of input - oracles• Membership queries

– Is string S in the target language?

• Equivalence queries– Is my hypothesis correct?

– If not, provide counter example

• Subset queries– Is the language of my hypothesis a subset of the target language?

Page 83: Grammar Induction: Techniques and Theory

83ICML 2006, Grammatical Inference83

Presentation of input

• Arbitrary order

• Shortest to longest

• All positive and negative examples up to some length

• Sampled according to some probability distribution

Page 84: Grammar Induction: Techniques and Theory

84ICML 2006, Grammatical Inference84

Presentation of input

• Text presentation

– A presentation of all strings in the target language

• Complete presentation (informant)

– A presentation of all strings over the alphabet of the target language labeled as + or -

Page 85: Grammar Induction: Techniques and Theory

85ICML 2006, Grammatical Inference85

Hypothesis space

• Regular grammars

– A welter of subclasses

• Context free grammars

– Fewer subclasses

• Hyper-edge replacement graph grammars

Page 86: Grammar Induction: Techniques and Theory

86ICML 2006, Grammatical Inference86

Success criteria

• Identification in the limit

– Text or informant presentation

– After each example, learner guesses language

– At some point, guess is correct and never changes

• PAC learning

Page 87: Grammar Induction: Techniques and Theory

87ICML 2006, Grammatical Inference87

Theorem’s due to Gold• The good news

– Any recursively enumerable class of languages can be learned in the limit from an informant (Gold, 1967)

• The bad news– A language class is superfinite if it includes all finite languages and at least one infinite language

– No superfinite class of languages can be learned in the limit from a text (Gold, 1967)

– That includes regular and context-free

Page 88: Grammar Induction: Techniques and Theory

88ICML 2006, Grammatical Inference88

A picture

Little information

A lot of information

Poor languages Rich Languages

Sub-classes of reg, from pos

Mildly context sensitive, from queries

DFA, from queries

Context-free, from pos

DFA, from pos+neg

Page 89: Grammar Induction: Techniques and Theory

89ICML 2006, Grammatical Inference89

AlgorithmsRPNI

K-Reversible

GRIDS

SEQUITUR

L*

Page 90: Grammar Induction: Techniques and Theory

90ICML 2006, Grammatical Inference90

4.1 RPNI

• Regular Positive and Negative Grammatical Inference

Identifying regular languages in polynomial time

Jose Oncina & Pedro García 1992

Page 91: Grammar Induction: Techniques and Theory

91ICML 2006, Grammatical Inference91

• It is a state merging algorithm;

• It identifies any regular language in the limit;

• It works in polynomial time;

• It admits polynomial charac-teristic sets.

Page 92: Grammar Induction: Techniques and Theory

92ICML 2006, Grammatical Inference92

The algorithm

function rmerge(A,p,q)

A = merge(A,p,q)

while ∃a∈Σ, p,q∈δA(r,a), p≠qdo

rmerge(A,p,q)

Page 93: Grammar Induction: Techniques and Theory

93ICML 2006, Grammatical Inference93

A=PTA(X); Fr ={δ(q0,a): a∈Σ };

K ={q0};

While Fr≠∅ do

choose q from Fr

if ∃p∈K: L(rmerge(A,p,q))∩X-=∅then A = rmerge(A,p,q)

else K = K ∪ {q}

Fr = {δ(q,a): q∈K} – {K}

Page 94: Grammar Induction: Techniques and Theory

94ICML 2006, Grammatical Inference94

X+={λ, aaa, aaba, ababa, bb, bbaaa}

a

a

aa

b

b

b

a

a

a

ba b

a

X-={aa, ab, aaaa, ba}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Page 95: Grammar Induction: Techniques and Theory

95ICML 2006, Grammatical Inference95

Try to merge 2 and 1

a

a

aa

b

b

b

a

a

a

ba b

a

X-={aa, ab, aaaa, ba}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Page 96: Grammar Induction: Techniques and Theory

96ICML 2006, Grammatical Inference96

Needs more merging for determinization

aa

aa

b

b

b

a

a

a

b a ba

X-={aa, ab, aaaa, ba}

1,2

3

4

5

6

7

8

9

10

11

12

13

14

15

Page 97: Grammar Induction: Techniques and Theory

97ICML 2006, Grammatical Inference97

But now string aaaa is accepted, so the merge must be rejected

a

b

b a

a

a

ab

a

X-={aa, ab, aaaa, ba}

1,2,4,7

3,5,8 6

9, 11

10

12

13

14

15

Page 98: Grammar Induction: Techniques and Theory

98ICML 2006, Grammatical Inference98

Try to merge 3 and 1

a

a

aa

b

b

b

a

a

a

ba b

a

X-={aa, ab, aaaa, ba}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Page 99: Grammar Induction: Techniques and Theory

99ICML 2006, Grammatical Inference99

Requires to merge 6 with {1,3}

a

a

aa

b

b

a

a

a

ba b

a

X-={aa, ab, aaaa, ba}

1,3

2

4

5

6

7

8

9

10

11

12

13

14

15

b

Page 100: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference100

And now to merge 2 with 10

a

a

aa

b

aa

a

ba b

a

X-={aa, ab, aaaa, ba}

1,3,6

2

4

5

7

8

9

10

11

12

13

14

15

b

Page 101: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference101

And now to merge 4 with 13

a

a

aa

b

a

ba b

a

X-={aa, ab, aaaa, ba}

1,3,6

2,10 4

5

7

8

9

11

12

13

14

15

b a

Page 102: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference102

And finally to merge 7 with 15

a

a

aa

b

a

ba b

a

X-={aa, ab, aaaa, ba}

1,3,6

2,10

4,13

5

7

8

9

11

12

14

15

b

Page 103: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference103

No counter example is accepted so the merges are kept

a

a

aa

bb

a ba

X-={aa, ab, aaaa, ba}

1,3,6

2,10

4,13

5

7,15

8

9

11

12

14

b

Page 104: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference104

Next possible merge to be checked is {4,13} with {1,3,6}

a

a

aa

bb

a ba

X-={aa, ab, aaaa, ba}

1,3,6

2,10

4,13

5

7,15

8

9

11

12

14

b

Page 105: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference105

a

a

ab

ba b

a

X-={aa, ab, aaaa, ba}

1,3,4,6,13

2,10

5

7,15

8

9

11

12

14

b

a

More merging for determinizationis needed

Page 106: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference106

ab

a ba

X-={aa, ab, aaaa, ba}

1,3,4,6,8,13

2,7,10,11,15

5

9

12

14

b

a

But now aa is accepted

Page 107: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference107

So we try {4,13} with {2,10}

a

a

aa

bb

a ba

X-={aa, ab, aaaa, ba}

1,3,6

2,10

4,13

5

7,15

8

9

11

12

14

b

Page 108: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference108

After determinizing, negative string aa is again accepted

a ba b

a

X-={aa, ab, aaaa, ba}

1,3,62,4,7,10,13,15

5,89,11 12

14

b

a

Page 109: Grammar Induction: Techniques and Theory

10ICML 2006, Grammatical Inference109

So we try 5 with {1,3,6}

a

a

aa

bb

a ba

X-={aa, ab, aaaa, ba}

1,3,6

2,10

4,13

5

7,15

8

9

11

12

14

b

Page 110: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference110

But again we accept ab

aa

aa

b

b

X-={aa, ab, aaaa, ba}

1,3,5,6,12

2,9,10,144,13

7,15

8

11

b

Page 111: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference111

So we try 5 with {2,10}

a

a

aa

bb

a ba

X-={aa, ab, aaaa, ba}

1,3,6

2,10

4,13

5

7,15

8

9

11

12

14

b

Page 112: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference112

Which is OK. So next possible merge is {7,15} with {1,3,6}

a

a

a

ab

b

X-={aa, ab, aaaa, ba}

1,3,6

2,5,10

4,9,13

7,15

8,12

11,14

b

Page 113: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference113

Which is OK. Now try to merge {8,12} with {1,3,6,7,15}

aa

a

ab

a

X-={aa, ab, aaaa, ba}

1,3,6,7,15

2,5,10

4,9,13

8,12

11,14

b

b

Page 114: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference114

And ab is accepted

a

a

b

a

X-={aa, ab, aaaa, ba}

1,3,6,7,8,12,15

2,5,10,11,14

4,9,13

b

b

Page 115: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference115

Now try to merge {8,12} with {4,9,13}

aa

a

ab

a

X-={aa, ab, aaaa, ba}

1,3,6,7,15

2,5,10

4,9,13

8,12

11,14

b

b

Page 116: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference116

This is OK and no more merge is possible so the algorithm halts.

aa

a

a

b

X-={aa, ab, aaaa, ba}

1,3,6,7,11,14,15

2,5,10

4,8,9,12,13

b

b

Page 117: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference117

Definitions

• Let ≤ be the length-lexordering over Σ*

• Let Pref(L) be the set of all prefixes of strings in some language L.

Page 118: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference118

Short prefixes

Sp(L)={u∈Pref(L): δ(q0,u)=δ(q0,v) ⇒ u≤v}

• There is one short prefix per useful state

0

1 2a

b

a

b b

aSp(L)={λ, a}

Page 119: Grammar Induction: Techniques and Theory

11ICML 2006, Grammatical Inference119

Kernel-sets

• N(L)={ua∈Pref(L): u∈Sp(L)}∪{λ}• There is an element in the Kernel-set for each useful transition

0

1 2a

b

a

b b

aN(L)={λ, a, b, ab}

Page 120: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference120

A characteristic sample

• A sample is characteristic (for RPNI) if

–∀x∈Sp(L) ∃xu∈X+–∀x∈Sp(L), ∀y∈N(L),

δ(q0,x)≠δ(q0,y) ⇒∃z∈Σ*: xz∈X+∧yz∈X- ∨

xz∈X-∧yz∈X+

Page 121: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference121

About characteristic samples• If you add more strings to a characteristic sample it still is characteristic;

• There can be many different characteristic samples;

• Change the ordering (or the exploring function in RPNI) and the characteristic sample will change.

Page 122: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference122

Conclusion• RPNI identifies any regular language in the limit;

• RPNI works in polynomial time.

Complexity is in O(║X+║3.║X-║);• There are many significant variants of RPNI;

• RPNI can be extended to other classes of grammars.

Page 123: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference123

Open problems

• RPNI’s complexity is not a tight upper bound. Find the correct complexity.

• The definition of the characteristic set is not tight either. Find a better definition.

Page 124: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference124

AlgorithmsRPNI

K-Reversible

GRIDS

SEQUITUR

L*

Page 125: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference125

4.2 The k-reversible languages• The class was proposed by Angluin(1982).

• The class is identifiable in the limit from text.

• The class is composed by regular languages that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k.

Page 126: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference126

Let A=(Σ, Q, δ, I, F) be a NFA, we denote by AT=(Σ, Q, δT, F, I) the reversal automaton with:

δT(q,a)={q’∈Q: q∈δ(q’,a)}

Page 127: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference127

0 1

3

b2

4

a

ba

a a a

0 1

3

b2

4

a

ba

a a a

A

AT

Page 128: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference128

Some definitions

• u is a k-successor of q if │u│=k and δ(q,u)≠∅.

• u is a k-predecessor of q if │u│=k and δT(q,uT)≠∅.

• λ is 0-successor and 0-predecessor of any state.

Page 129: Grammar Induction: Techniques and Theory

12ICML 2006, Grammatical Inference129

0 1

3

b2

4b

a

a a a

A

• aa is a 2-successor of 0 and 1 but not of 3.

• a is a 1-successor of 3.

• aa is a 2-predecessor of 3 but not of 1.

a

Page 130: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference130

A NFA is deterministic with look-ahead k iff ∀q,q’∈Q: q≠q’(q,q’∈I) ∨ (q,q’∈δ(q”,a))

(u is a k-successor of q) ∧(v is a k-successor of q’) ⇒ u≠v

Page 131: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference131

Prohibited:

2

1

a

a

u

u

│u│=k

Page 132: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference132

Example

This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2.

0 1

3

b2

4

a

ba

a a a

Page 133: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference133

K-reversible automata• A is k-reversible if A is deterministic and AT is deterministic with look-ahead k.

• Example

0 1

b

2baa

b

0 1

b

2baa

bdeterministic deterministic with look-ahead 1

Page 134: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference134

Violation of k-reversibility• Two states q, q’ violate the k-reversibility condition iff– they violate the deterministic condition: q,q’∈δ(q”,a);

or

– they violate the look-ahead condition: •q,q’∈F, ∃u∈Σk: u is k-predecessor of both;

•∃u∈Σk, δ(q,a)=δ(q’,a) and u is k-predecessor of both q and q’.

Page 135: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference135

Learning k-reversible automata

• Key idea: the order in which the merges are performed does not matter!

• Just merge states that do not comply with the conditions for k-reversibility.

Page 136: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference136

K-RL Algorithm (φk-RL)

Data: k∈ , X sample of a k-RL L

A=PTA(X)

While ∃q,q’ k-reversibility violators do

A=merge(A,q,q’)

Page 137: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference137

Let X={a, aa, abba, abbbba}

a

λ ab abb

aa

abbbbabbb abbbba

abbaa

b b b b a

a

a

k=2

Violators, for u= ba

Page 138: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference138

Let X={a, aa, abba, abbbba}

a

λ ab abb

aa

abbbbabbb

abbaa

b b b ba

a

a

k=2

Violators, for u= bb

Page 139: Grammar Induction: Techniques and Theory

13ICML 2006, Grammatical Inference139

Let X={a, aa, abba, abbbba}

a

λ ab abb

aa

abbb

abbaa

b b bb

a

a

k=2

Page 140: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference140

Properties (1)• ∀k≥0, ∀X, φk-RL(X) is a k-reversible language.

• L(φk-RL(X)) is the smallest k-reversible language that contains X.

• The class Lk-RL is identifiable in the limit from text.

Page 141: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference141

Properties (2)• Any regular language is k-reversible iff

(u1v)-1L ∩(u2v)-1L≠∅ and │v│=k

⇒(u1v)

-1L=(u2v)-1L

(if two strings are prefixes of a string of length at least k, then the strings are

Nerode-equivalent)

Page 142: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference142

Properties (3)

• Lk-RL(X) ⊂ L(k+1)-RL(X)• Lk-TSS(X) ⊂ L(k-1)-RL(X)

Page 143: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference143

Properties (4)

The time complexity is O(k║X║3).

The space complexity is O(║X║).

The algorithm is not incremental.

Page 144: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference144

Properties (4) Polynomial aspects

• Polynomial characteristic sets

• Polynomial update time

• But not necessarily a polynomial number of mind changes

Page 145: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference145

Extensions• Sakakibara built an extension for context-free grammars whose tree language is k-reversible

• Marion & Besombes propose an extension to tree languages.

• Different authors propose to learn these automata and then estimate the probabilities as an alternative to learning stochastic automata.

Page 146: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference146

Exercises

• Construct a language L that is not k-reversible, ∀k≥0.

• Prove that the class of k-reversible languages is not in TxtEx.

• Run φk-RL on X={aa, aba, abb, abaaba, baaba} for k=0,1,2,3

Page 147: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference147

Solution (idea)

• Lk={ai: i≤k}

• Then for each k: Lk is k-reversible but not k-1-reversible.

• And ULk = a*

• So there is an accumulation point…

Page 148: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference148

AlgorithmsRPNI

K-Reversible

GRIDS

SEQUITUR

L*

Page 149: Grammar Induction: Techniques and Theory

14ICML 2006, Grammatical Inference149

4.3 Active Learning: learning DFA from membership and

equivalence queries: the L* algorithm

Page 150: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference150

The classes C and H

• sets of examples

• representations of these sets

• the computation of L(x) (and h(x)) must take place in time polynomial in ⏐x⏐.

Page 151: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference151

Correct learningA class C is identifiable with a polynomial number of queries of type T if there exists an algorithm φ that:

1) ∀L∈C identifies L with a polynomial number of queries of type T;

2) does each update in time polynomial in ⎪f⎪ and in Σ⎪xi⎪, {xi} counter-examples seen so far.

Page 152: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference152

Algorithm L*• Angluin’s papers

• Some talks by Rivest

• Kearns and Vazirani

• Balcazar, Diaz, Gavaldà & Watanabe

Page 153: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference153

Some references• Learning regular sets from queries and counter-examples, D. Angluin, Information and computation, 75, 87-106, 1987.

• Queries and Concept learning, D. Angluin, Machine Learning, 2, 319-342, 1988.

• Negative results for Equivalence Queries, D. Angluin, Machine Learning, 5, 121-150, 1990.

Page 154: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference154

The Minimal Adequate Teacher

• You are allowed:

– strong equivalence queries;

– membership queries.

Page 155: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference155

General idea of L*• find a consistent table (representing a DFA);

• submit it as an equivalence query;

• use counterexample to update the table;

• submit membership queries to make the table complete;

• Iterate.

Page 156: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference156

An observation table

λa

abaab

λ a

1 0

00

010001

Page 157: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference157

The states (S) or test set

The transitions (T)

The experiments (E)

λa

abaab

λ a

1 0

00

010001

Page 158: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference158

Meaning

δ(q0, λ. λ)∈F⇔

λ ∈L

λa

abaab

λ a

1 0

00

010001

Page 159: Grammar Induction: Techniques and Theory

15ICML 2006, Grammatical Inference159

δ(q0, ab.a)∉ F⇔

aba ∉ L

λa

abaab

λ a

1 0

00

010001

Page 160: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference160

Equivalent prefixes

These two rows are equal,

hence

δ(q0,λ)= δ(q0,ab)

λa

abaab

λ a

1 0

00

010001

Page 161: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference161

Building a DFA from a table

λa

abaab

λ a

1 0

00

010001

λ

a

a

Page 162: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference162

λ

λ

a

a

abaab

1 0

00

010001

λ

a

a

b

a

b

Page 163: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference163

λ

λ

a

a

abaab

1 0

00

010001

λ

a

a

b

a

b

Some rules

This set is prefix-closed

This set is suffix-closed

SΣ\S=T

Page 164: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference164

An incomplete table

λ

λ

a

a

abaab 01

001

1 0

0

λ

a

a

b

a

b

Page 165: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference165

Good idea

We can complete the table by making membership queries...

u

v

?uv∈L ?

Membership query:

Page 166: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference166

A table is

closed if any row of Tcorresponds to some row in S

λ

λ

a

a

abaab

1 0

00

011001

Not closed

Page 167: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference167

And a table that is not closed

λa

abaab

λ a

1 0

00

011001

λ

a

a

b

a

b

?

Page 168: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference168

What do we do when we have a table that is not closed?

• Let s be the row (of T) that does not appear in S.

• Add s to S, and ∀a∈Σ sa to T.

Page 169: Grammar Induction: Techniques and Theory

16ICML 2006, Grammatical Inference169

An inconsistent table

λ

abaa

1 0a

b00

00

010

λ a

1

bbba 01

00

Are a and bequivalent?

Page 170: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference170

A table is consistent if

Every equivalent pair of rows in H remains equivalent in Safter appending any symbol

row(s1)=row(s2)

⇒∀a∈Σ, row(s1a)=row(s2a)

Page 171: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference171

What do we do when we have an inconsistent table?

Let a∈Σ be such that row(s1)=row(s2) butrow(s1a)≠row(s2a)

• If row(s1a)≠row(s2a), it is so for experiment e

• Then add experiment ae to the table

Page 172: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference172

What do we do when we have a closed and consistent table ?

• We build the corresponding DFA

• We make an equivalence query!!!

Page 173: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference173

What do we do if we get a counter-example?

• Let u be this counter-example

• ∀w∈Pref(u) do– add w to S

–∀a∈Σ, such that wa∉Pref(u) add wa to T

Page 174: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference174

Run of the algorithm

λ

λ

a

b

1

1

1 Table is now closed

and consistentλ

b

a

Page 175: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference175

An equivalence query is made!

λ

b

a

Counter example baa is returned

Page 176: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference176

λ

a

b

1

0baaba

baaa

bbbab

baab

1

01

1

11

Not consistent

λ

1

Because of

Page 177: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference177

λ

a

b

1

0 0 baaba

baaa

bbbab

baab

1 1

0 1

1 0

1 1

λ a

1 1 Table is now closed and consistent

λ ba

baa

a

b

a

b b

a

0

0 0

1 1

Page 178: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference178

Proof of the algorithm

Sketch only

Understanding the proof is important for further algorithms

Balcazar et al. is a good place for that.

Page 179: Grammar Induction: Techniques and Theory

17ICML 2006, Grammatical Inference179

Termination / Correctness• For every regular language there is a unique minimal DFA that recognizes it.

• Given a closed and consistent table, one can generate a consistent DFA.

• A DFA consistent with a table has at least as many states as different rows in S.

• If the algorithm has built a table with n different rows in S, then it is the target.

Page 180: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference180

Finiteness

• Each closure failure adds one different row to S.

• Each inconsistency failure adds one experiment, which also creates a new row in S.

• Each counterexample adds one different row to S.

Page 181: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference181

Polynomial

• |E| ≤ n

• at most n-1 equivalence queries

• |membership queries| ≤ n(n-1)m where m is the length of the longest counter-example returned by the oracle

Page 182: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference182

Conclusion• With an MAT you can learn DFA

– but also a variety of other classes of grammars;

– it is difficult to see how powerful is really an MAT;

– probably as much as PAC learning.

– Easy to find a class, a set of queries and provide and algorithm that learns with them;

– more difficult for it to be meaningful.

• Discussion: why are these queries meaningful?

Page 183: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference183

AlgorithmsRPNI

K-Reversible

GRIDS

SEQUITUR

L*

Page 184: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference184

4.4 SEQUITUR

(http://sequence.rutgers.edu/sequitur/)

(Neville Manning & Witten, 97)

Idea: construct a CF grammar from a very long string w, such that L(G)={w}– No generalization

– Linear time (+/-)

– Good compression rates

Page 185: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference185

Principle

The grammar with respect to the string:

– Each rule has to be used at least twice;

– There can be no sub-string of length 2 that appears twice.

Page 186: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference186

ExamplesS→abcdbc S →aAdA

A →bc

S→AbAabA →aa

S→aabaaab

S→AaAA →aab

Page 187: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference187

abcabdabcabd

Page 188: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference188

In the beginning, God created the heavens and the earth.

And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.

And God said, Let there be light: and there was light.

And God saw the light, that it was good: and God divided the light from the darkness.

And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.

And God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters.

And God made the firmament, and divided the waters which were under the firmament from the waters which were above the firmament: and it was so.

And God called the firmament Heaven. And the evening and the morning were the second day.

Page 189: Grammar Induction: Techniques and Theory

18ICML 2006, Grammatical Inference189

Page 190: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference190

• appending a symbol to rule S;

• using an existing rule;

• creating a new rule;

• and deleting a rule.

Sequitur options

Page 191: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference191

Results

On text:

– 2.82 bpc

– compress 3.46 bpc

– gzip 3.25 bpc

– PPMC 2.52 bpc

Page 192: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference192

AlgorithmsRPNI

K-Reversible

GRIDS

SEQUITUR

L*

Page 193: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference193

4.5 Using a simplicity bias (Langley & Stromsten, 00)

Based on algorithm GRIDS (Wolff, 82)

Main characteristics:– MDL principle;

– Not characterizable;

– Not tested on large benchmarks.

Page 194: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference194

Two learning operatorsCreation of non terminals and rules

NP →ART ADJ NOUNNP →ART ADJ ADJ NOUN

NP →ART AP1NP →ART ADJ AP1AP1 → ADJ NOUN

Page 195: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference195

Merging two non terminals

NP →ART AP1NP →ART AP2AP1 → ADJ NOUNAP2 → ADJ AP1

NP →ART AP1AP1 → ADJ NOUNAP1 → ADJ AP1

Page 196: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference196

• Scoring function: MDL

principle: ⎪G⎪+Σw∈T ⎪d(w)⎪

• Algorithm:

– find best merge that improves current grammar

– if no such merge exists, find best creation

– halt when no improvement

Page 197: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference197

Results

• On subsets of English grammars (15 rules, 8 non terminals, 9 terminals): 120 sentences to converge

• on (ab)*: all (15) strings of length ≤ 30

• on Dyck1: all (65) strings of length ≤ 12

Page 198: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference198

AlgorithmsRPNI

K-Reversible

GRIDS

SEQUITUR

L*

Page 199: Grammar Induction: Techniques and Theory

19ICML 2006, Grammatical Inference199

5 Open questions and conclusions• dealing with noise

• classes of languages that adequately mix Chomsky’s hierarchy with edit distance

• stochastic context-free grammars

• polynomial learning from text

• learning POMDPs

• fast algorithms…