PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand...

56
November 21, 2016 ERATO 合宿 Partial Order Structure and Information Geometry (順序構造と情報幾何) Mahito Sugiyama (ISIR, Osaka University, PRESTO) (杉山 麿人; 大阪大学産業科学研究所, JST さきがけ)

Transcript of PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand...

Page 1: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

November 21, 2016ERATO合宿

Partial Order Structure andInformation Geometry(順序構造と情報幾何)

Mahito Sugiyama (ISIR, Osaka University, PRESTO)(杉山 麿人;大阪大学産業科学研究所, JSTさきがけ)

Page 2: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Today’s Model on Poset (S, ≤)

log p(x) =s∈S

ζ(s, x)θ(s)

p(x) =s∈S

µ(x , s)η(s)

1/39

Page 3: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Today’s Model on Poset (S, ≤)

log p(x) =s∈S

ζ(s, x)θ(s)

p(x) =s∈S

µ(x , s)η(s)

Probability

Coe�cient of log-linear model(Bias/weight in Boltzmann machines)(Natural parameter of exponential family)

Zeta function

Möbius function Expectation(Frequency in pattern mining)(Su�cient statistics in exponential family)

log p(x) =s∈S

ζ(s, x)θ(s)

p(x) =s∈S

µ(x , s)η(s)

1/39

Page 4: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Outcome

• Given a poset (S , ≤) and consider distributions on S– The least element⊥ ∈ S

1. KL divergence decomposition:DKL[P, R] = DKL[P, Q] + DKL[Q , R]with Q s.t. θQ (x) = θR(x) or ηQ (x) = ηP(x) for all x ∈ S \ {⊥}

2. The set of probability distributions on (S , ≤) isa dually flat manifold w.r.t. θ and η– p, θ, and η are coordinate systems– θ and η are orthogonal– θ introduces the structure of exponential family– η introduces the structure of mixture family

2/39

Page 5: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Partially Ordered Sets

{x, y, z}

{x, z} {y, z}

{y} {z}

{x, y}

{x}

Power set

3/39

Page 6: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Partially Ordered Sets

{x, y, z}

{x, z} {y, z}

{y} {z}

{x, y}

{x}

Power set

0

1

2

3

Positive integers

3/39

Page 7: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Partially Ordered Sets

{x, y, z}

{x, z} {y, z}

{y} {z}

{x, y}

{x}

Power set

0

1

2

3

Positive integers

01 10

0 1

1100

111110101100011010001000

λ

Pre�xes

3/39

Page 8: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Partially Ordered Sets

{x, y, z}

{x, z} {y, z}

{y} {z}

{x, y}

{x}

Power set

0

1

2

3

Positive integers

01 10

0 1

1100

111110101100011010001000

λ

Pre�xes

Directed Acyclic Graph

3/39

Page 9: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Posets with Probability Distribution

Probability distributionon posets (partially ordered sets)

Prob

abili

ty

4/39

Page 10: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Posets with Probability Distribution

Probability distributionon posets (partially ordered sets)

Prob

abili

ty

Informationgeometry

log p(x) = ∑ ζ(s, x)θ(s)Decomposition inthe log-linear model

4/39

Page 11: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Posets with Probability Distribution

Probability distributionon posets (partially ordered sets)

Prob

abili

ty

log p(x) = ∑ ζ(s, x)θ(s)Decomposition inthe log-linear model

(e.g. Neurons, SNPs, ...)

0 0 11 0 01 1 10 0 01 1 00 1 11 0 11 0 11 0 11 1 0

Numerical score(KL divergence)and the p-valuefor higher-orderintractions

{x, y, z}{x, z}

{y, z}

{y}

{z}

{x, y}

{x}

{x, y, z}{x, z}

{y, z}

{y}

{z}

{x, y}

{x}

4/39

Page 12: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Binary vectors(Transactiondatabase)

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

Poset (itemset lattice)

5/39

Page 13: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Binary vectors(Transactiondatabase)

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

Poset (itemset lattice)

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

6/39

Page 14: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Binary vectors(Transactiondatabase)

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

Poset (itemset lattice)

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

7/39

Page 15: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Binary vectors(Transactiondatabase)

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

Poset (itemset lattice)

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

, 0.0

0.1 0.1 0.0

0.3 0.2 0.0

Probability = 0.3

7/39

Page 16: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

Poset (itemset lattice)

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

, 0.0

0.1 0.1 0.0

0.3 0.2 0.0

Probability = 0.3

Upward =Pattern mining

η( ) = p( ) + p( ){ , } { , } { , , }

η: Frequencyp: Probability

7/39

Page 17: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

Poset (itemset lattice)

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

, 0.0

0.1 0.1 0.0

0.3 0.2 0.0

Probability = 0.3

Upward =Pattern mining

η( ) = p( ) + p( ){ , } { , } { , , }

η: Frequencyp: Probabilityη: Frequencyp: Probabilityθ: Coe�cient of

log-linear model

log p( ) = θ( ) + θ( ) + θ( ) + θ(∅){ , } { , } { } { }

Downward =Log-linear analysis

8/39

Page 18: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

log p(x) = ∑ ζ(s, x)θ(s)

9/39

Page 19: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

log p(x) = ∑ ζ(s, x)θ(s)

p(x) = exp( ∑ θ(s)Fs(x) – ψ(θ) )Exponentialfamily:

Naturalparameter

e.g. Gaussian

10/39

Page 20: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

log p(x) = ∑ ζ(s, x)θ(s)

p(x) = exp( ∑ θ(s)Fs(x) – ψ(θ) )Exponentialfamily:

Naturalparameter

e.g. Gaussian

Su�cientstatistics ofexponentialfamily

η(x) = �[ Fx(s) ]

η(x) = ∑ ζ(x, s)p(s)

11/39

Page 21: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Möbius Inversion on Posets

• Zeta function ζ∶S × S → {0, 1}:ζ(s, x) = { 1 if s ≤ x ,

0 otherwise

• Möbius function µ∶S × S → Z, defined as µ = ζ −1:

µ(x , y) = ⎧⎪⎪⎪⎨⎪⎪⎪⎩1, if x = y,−∑x≤s<y µ(x , s) if x < y,0 otherwise

• The Möbius inversion formula [Rota (1964)]:g(x) = ∑

s∈Sζ(s, x)f (s) ⇔ f (x) = ∑

s∈Sµ(s, x)g(s)

12/39

Page 22: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Möbius Function Is Generalization ofInclusion-Exclusion Principle

• For sets A, B, C ,∣A ∪ B ∪ C∣ = ∣A∣ + ∣B∣ + ∣C∣ − ∣A ∩ B∣ − ∣B ∩ C∣ − ∣A ∩ C∣+ ∣A ∩ B ∩ C∣

• In general, for A1 , A2 , . . . , An ,»»»»»»»»»»⋃i A i

»»»»»»»»»» = ∑J⊆{1, . . . ,n}, J/=∅(−1)∣J∣−1

»»»»»»»»»»»⋂j∈J A j

»»»»»»»»»»»• The Möbius function µ is the generalization of “(−1)∣J∣−1”

13/39

Page 23: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Log-linear Model with Möbius Inversion

• Log-linear model and its sufficient statistics:

log p(x) = ∑s∈S

ζ(s, x)θ(s) = ∑s≤x

θ(s),η(x) = ∑

s∈Sζ(x , s)p(s) = ∑

s≥xp(s)

– Generalization of the log-linear model on binary vectors:

log p(x) = ∑i

θ i x i +∑i< j

θ i j x i x j + ⋅ ⋅ ⋅ + θ 1. . .n x 1x2 . . . xn ,

• From the Möbius inversion formula,θ(x) = ∑

s∈Sµ(s, x) log p(s), p(x) = ∑

s∈Sµ(x , s)η(s)

14/39

Page 24: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

15/39

Page 25: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

{ }

{ }p( )

p( )

{ , }p( )

15/39

Page 26: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

{ }

{ }p( )

p( )

{ , }p( )0.3

0.20.2

0.40.4

Probability distributionis a “point” in 3D space

16/39

Page 27: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

{ }

{ }η( )

η( )

{ , }η( )

0.20.2

0.60.6

0.50.5

Probability distributionis a “point” in 3D space

17/39

Page 28: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

{ }

{ }θ( )

θ( )

{ , }θ( )

–1.79–1.791.391.39

1.101.10

Probability distributionis a “point” in 3D space

18/39

Page 29: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

{ , }

{ }

{ }

one-to-one

19/39

Page 30: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

{ , }

{ } { }

0.30.5

1.10

0.20.2

–1.79

0.11.0

–2.30

0.40.6

1.39

pηθ

Triple for each node

η(x) = ∑ p(s)s ≥ x

log p(x) = ∑ θ(s)s ≤ x

{ , }

one-to-one

{ }θ( ){ }

{ }η( )θ and η aredually orthogonal

19/39

Page 31: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Orthogonality of θ and η

• FromMöbius inversion,

∑s∈S

ζ(x , s)µ(s, y) = δx ,y , δx ,y = { 1 if x = y,0 otherwise

• θ and η are dually orthogonal:

E [ ∂∂θ(x) log p(s) ∂

∂η(y) log p(s)] = ∑s∈S

ζ(x , s)µ(s, y) = δx ,y

• Partial order structure leads to the samedually flat structure with the exponential family

20/39

Page 32: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Existing Approach Limited To Power Set

{x, y, z}

{x, z} {y, z}

{y} {z}

{x, y}

{x}

Power set

21/39

Page 33: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Our Approach Applies To Any Posets

{x, y, z}

{y, z}

{y}

{x, y}

{x}

Subset of power set

22/39

Page 34: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Our Approach Applies To Any Posets

0

1

2

3

Positive integers

01 10

0 1

1100

111110101100011010001000

λ

Pre�xes

Directed Acyclic Graph

{x, y, z}

{y, z}

{y}

{x, y}

{x}

Subset of power set

22/39

Page 35: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

KL Divergence Decomposition

• KL divergence decomposition:DKL[P, R] = DKL[P, Q] + DKL[Q , R]with Q s.t. θQ (x) = θR(x) or ηQ (x) = ηP(x) for all x ∈ S \ {⊥}– Q is called the mixed distribution of (P, R)– It is known as the (generalized) Pythagoras theoremin Information Geometry

• We can derive fromMöbius inversion:DKL[P, Q] + DKL[Q , R] − DKL[P, R]

= ∑s∈S

(ηQ (s) − ηP(s)) (θQ (s) − θR(s))23/39

Page 36: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.70.81.95

0.10.1

–1.950.10.20.0

Dist. P Dist. Rpηθ

24/39

Page 37: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.70.81.95

0.10.1

–1.950.10.20.0

Dist. P Dist. Rpηθ

choose ηchoose θMixed

distribution Q

??????1.95

???0.2???

??????0.0

24/39

Page 38: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.70.81.95

0.10.1

–1.950.10.20.0

Dist. P Dist. Rpηθ

choose ηchoose θMixed

distribution Q

0.620.51.95

0.0890.60.0

–1.13

0.20.2

–1.13

25/39

Page 39: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.70.81.95

0.10.1

–1.950.10.20.0

Dist. P Dist. Rpηθ

choose ηchoose θMixed

distribution Q

0.620.51.95

0.0890.60.0

–1.13

0.20.2

–1.13

KL[P, R]

26/39

Page 40: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.70.81.95

0.10.1

–1.950.10.20.0

Dist. P Dist. Rpηθ

choose ηchoose θMixed

distribution Q

0.620.51.95

0.0890.60.0

–1.13

0.20.2

–1.13

KL[P, R] KL[P, Q]+ KL[Q, R]

=

Nonnegative decompositionof the KL divergence

27/39

Page 41: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.70.81.95

0.10.1

–1.950.10.20.0

Dist. P Dist. Rpηθ

choose ηchoose θMixed

distribution Q

0.620.51.95

0.0890.60.0

–1.13

0.20.2

–1.13

+ =

Nonnegative decompositionof the KL divergence

0.3946 0.04440.4390

28/39

Page 42: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.250.50.0

0.250.250.0

0.250.50.0

Dist. P Uniform dist. P0pηθ

29/39

Page 43: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.250.50.0

0.250.250.0

0.250.50.0

Dist. P Uniform dist. P0pηθ

choose η choose θ(KNOCK DOWN)

log p(x) = ∑ θ(s)s ≤ x

Log-linear model

???0.5???

??????0.0

???0.6???

Mixeddistribution Q

29/39

Page 44: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.250.50.0

0.250.250.0

0.250.50.0

Dist. P Uniform dist. P0pηθ

choose η choose θ(KNOCK DOWN)

log p(x) = ∑ θ(s)s ≤ x

Log-linear model

0.20.50.0

0.30.30.0

0.30.60.405

Mixeddistribution R

Contribution of the node= KL[P, Q] = 0.086

30/39

Page 45: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

0.30.51.10

0.20.2

–1.790.40.61.39 ∅

0.250.50.0

0.250.250.0

0.250.50.0

Dist. P Uniform dist. P0pηθ

choose η choose θ(KNOCK DOWN)

log p(x) = ∑ θ(s)s ≤ x

Log-linear model

0.20.50.0

0.30.30.0

0.30.60.405

Mixeddistribution R

Contribution of the node= KL[P, Q] = 0.086

The statistics λ: λ = 2·[sample size]·KL[P, Q]follows χ2-distributionwith d.f. [#nodes – 1]⇒ p-value can be obtained!

31/39

Page 46: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Poset of Subgraphs

32/39

Page 47: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Log-Linear Model on Subgraphs

log p(x) = ∑ θ(s)s ⊑ x

Log-linear model:

Natural parameterof exponential familyNatural parameterof exponential family

η(x) = ∑ p(s)s ⊒ x

Su�cient statisticsof exponential familySu�cient statisticsof exponential family

33/39

Page 48: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Information of Each Subgraph

η0 η1 η2 η3 η4 η5 ηkθ0 θ1 θ2 θ3 θ4 θ5 θk

P: Empirical distribution ηj

θi

Freq.Coef.

34/39

Page 49: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Information of Each Subgraph

η0 η1 η2 η3 η4 η5 ηkθ0 θ1 θ2 θ3 θ4 θ5 θk

P: Empirical distribution ηj

θi

Freq.Coef.

KL(P, Q)

η0 η1 η2 η3 ? η5 ηk? ? ? ? 0 ? ?

Q: Null distribution KL(P, Q)

Freq.Coef.

34/39

Page 50: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Information of Each Subgraph

η0 η1 η2 η3 η4 η5 ηkθ0 θ1 θ2 θ3 θ4 θ5 θk

P: Empirical distribution ηj

θi

Freq.Coef.

KL(P, Q)

η0 η1 η2 η3 ? η5 ηk? ? ? ? 0 ? ?

Q: Null distribution KL(P, Q)

Freq.Coef.

= KL(P, Q)= KL(P, R)+ KL(Q, Q

KL(Q, R)

KL(P, R)= KL(P, Q)+ KL(Q, R)

1.0 ? ? ? ? ? ?θ0’ 0 0 0 0 0 0

R: Uniform distribution KL(Q, R)

Freq.Coef.

34/39

Page 51: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Make a Poset from Data

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Dataset

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

Number of nodes = 2#features

⇒combinatorial explosion!

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

, 0.0

0.1 0.1 0.0

0.3 0.2 0.0

Probability = 0.3

35/39

Page 52: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Make a Poset from Data

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Dataset

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

{ , }{ , }

{ } { }

{ , }

{ , , }

{ }

1.0

0.9 0.7 0.5

0.6 0.5 0.3

Frequency = 0.3

, 0.0

0.1 0.1 0.0

0.3 0.2 0.0

Probability = 0.3

Probability ≥ 0.2(user speci�ed threshold)

35/39

Page 53: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Remove Nodes with Probability 0

ID 1:ID 2:ID 3:ID 4:ID 5:ID 6:ID 7:ID 8:ID 9:ID10:

Dataset

1 1 01 1 11 1 01 1 11 1 01 0 11 0 11 1 11 0 00 1 0

{ , }{ , }

{ , , }0.30.30.0

pηθ

0.30.6

0.405

0.20.50.0

0.21.0

–1.61

= 1 – ∑ p(s)

36/39

Page 54: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Example on Real Data (kosarak)

ID 1:ID 2:ID 3:ID 4:ID 5:

1 1 01 1 11 1 01 1 11 1 0

Sample size:990,002

# features: 41,270# nodes: 3,253(Threshold: 10–5)

# signi�cant interactions: 583Single feature: 537Pairwise interactions: 41Triple interactions: 5

Total runtime:4.95 seconds

37/39

Page 55: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Example on Real Data (accidents)

ID 1:ID 2:ID 3:ID 4:ID 5:

1 1 01 1 11 1 01 1 11 1 0

Sample size:340,183

# nodes: 281(Threshold: 5×10–6)

# signi�cant interactions: 280# features in each interactionis between 26 to 41

Total runtime:4.95 seconds

# features: 468

38/39

Page 56: PartialOrderStructureand InformationGeometryNovember21,2016 ERATO合宿 PartialOrderStructureand InformationGeometry (順序構造と情報幾何) MahitoSugiyama(ISIR,OsakaUniversity,PRESTO)

Conclusion

• A close connection between the partial order structureand information geometry– Möbius inversion leads to the dually flat manifolds

◦ M. Sugiyama, H. Nakahara, K. Tsuda,Information Decomposition on Structured Space,IEEE ISIT (2016)

◦ S. Amari, Information geometry on hierarchy ofprobability distributions, IEEE Trans. Info. Theory (2001)

◦ H. Nakahara, S. Amari, Information-geometric measurefor neural spikes, Neural Computation (2002)

• We can decompose the KL divergence andasses the significance on any posets

39/39