Introduction to Machine Learning -...

42
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 1 / 11

Transcript of Introduction to Machine Learning -...

Page 1: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Introduction to Machine Learning

Machine Learning: Jordan Boyd-GraberUniversity of MarylandRADEMACHER COMPLEXITY

Slides adapted from Rob Schapire

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 1 / 11

Page 2: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Recap

� Rademacher complexity provides nice guarantees

R(h)≤ R̂(h)+Rm (H)+O

√ log 1δ

2m

!

(1)

� But in practice hard to compute for real hypothesis classes

� Is there a relationship with simpler combinatorial measures?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 2 / 11

Page 3: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Growth Function

Define the growth function ΠH :N→N for a hypothesis set H as:

∀m ∈N,ΠH(m)≡ max{x1,...,xm}∈X

�{(h(x1), . . . ,h(xm) : h ∈H}�

� (2)

i.e., the number of ways m points can be classified using H.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 3 / 11

Page 4: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Growth Function

Define the growth function ΠH :N→N for a hypothesis set H as:

∀m ∈N,ΠH(m)≡ max{x1,...,xm}∈X

�{(h(x1), . . . ,h(xm) : h ∈H}�

� (2)

i.e., the number of ways m points can be classified using H.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 3 / 11

Page 5: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Rademacher Complexity vs. Growth Function

If G is a function taking values in {−1,+1}, then

Rm (G)≤

√2 lnΠG(m)

m(3)

Uses Masart’s lemma

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 4 / 11

Page 6: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Vapnik-Chervonenkis Dimension

VC(H)≡max�

m :ΠH(m) = 2m

(4)

The size of the largest set that can be fully shattered by H.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 11

Page 7: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Vapnik-Chervonenkis Dimension

VC(H)≡max�

m :ΠH(m) = 2m

(4)

The size of the largest set that can be fully shattered by H.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 5 / 11

Page 8: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

VC Dimension for Hypotheses

� Need upper and lower bounds

� Lower bound: example

� Upper bound: Prove that no set of d +1 points can be shattered by H(harder)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 6 / 11

Page 9: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 10: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 11: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2� What about three points?� No set of three points can be shattered� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 12: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2� What about three points?� No set of three points can be shattered� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 13: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 14: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 15: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 16: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 17: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Intervals

What is the VC dimension of [a,b] intervals on the real line.

� What about two points?

� Two points can be perfectly classified, so VC dimension ≥ 2

� What about three points?

� No set of three points can be shattered

� Thus, VC dimension of intervals is 2

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 7 / 11

Page 18: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Sine Functions

� Consider hypothesis that classifies points on a line as either beingabove or below a sine wave

{t→ sin(ωx) :ω ∈R} (5)

� Can you shatter three points?

� Can you shatter four points?� How many points can you shatter?� Thus, VC dim of sine on line is∞

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 11

Page 19: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Sine Functions

� Consider hypothesis that classifies points on a line as either beingabove or below a sine wave

{t→ sin(ωx) :ω ∈R} (5)

� Can you shatter three points?

� Can you shatter four points?� How many points can you shatter?� Thus, VC dim of sine on line is∞

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 11

Page 20: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Sine Functions

� Consider hypothesis that classifies points on a line as either beingabove or below a sine wave

{t→ sin(ωx) :ω ∈R} (5)

� Can you shatter three points?

� Can you shatter four points?

� How many points can you shatter?� Thus, VC dim of sine on line is∞

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 11

Page 21: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Sine Functions

� Consider hypothesis that classifies points on a line as either beingabove or below a sine wave

{t→ sin(ωx) :ω ∈R} (5)

� Can you shatter three points?� Can you shatter four points?

� How many points can you shatter?

� Thus, VC dim of sine on line is∞

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 11

Page 22: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Sine Functions

� Consider hypothesis that classifies points on a line as either beingabove or below a sine wave

{t→ sin(ωx) :ω ∈R} (5)

� Can you shatter three points?� Can you shatter four points?� How many points can you shatter?

� Thus, VC dim of sine on line is∞

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 8 / 11

Page 23: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Connecting VC with growth function

VC dimension obviously encodes the complexity of a hypothesis class, butwe want to connect that to Rademacher complexity and the growth functionso we can prove generalization bounds.

Theorem

Sauer’s Lemma Let H be a hypothesis set with VC dimension d. Then∀m ∈N

ΠH(m)≤d∑

i=0

m

i

≡Φd(m) (6)

This is good because the sum when multiplied out becomes

(mi ) =

m·(m−1)...i! =O (md). When we plug this into the learning error limits:

log(ΠH(2m)) = log(O (md)) =O (d logm).

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 11

Page 24: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Connecting VC with growth function

VC dimension obviously encodes the complexity of a hypothesis class, butwe want to connect that to Rademacher complexity and the growth functionso we can prove generalization bounds.

Theorem

Sauer’s Lemma Let H be a hypothesis set with VC dimension d. Then∀m ∈N

ΠH(m)≤d∑

i=0

m

i

≡Φd(m) (6)

This is good because the sum when multiplied out becomes

(mi ) =

m·(m−1)...i! =O (md). When we plug this into the learning error limits:

log(ΠH(2m)) = log(O (md)) =O (d logm).

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 11

Page 25: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Connecting VC with growth function

VC dimension obviously encodes the complexity of a hypothesis class, butwe want to connect that to Rademacher complexity and the growth functionso we can prove generalization bounds.

Theorem

Sauer’s Lemma Let H be a hypothesis set with VC dimension d. Then∀m ∈N

ΠH(m)≤d∑

i=0

m

i

≡Φd(m) (6)

This is good because the sum when multiplied out becomes

(mi ) =

m·(m−1)...i! =O (md). When we plug this into the learning error limits:

log(ΠH(2m)) = log(O (md)) =O (d logm).

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 9 / 11

Page 26: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Proof of Sauer’s Lemma

Prelim:

We’ll proceed by induction. Our two base cases are:

� If m = 0, ΠH(m) = 1. You have no data, so there’s only one(degenerate) labeling

� If d = 0, ΠH(m) = 1. If you can’t even shatter a single point, then it’s afixed function

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 11

Page 27: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Proof of Sauer’s Lemma

Prelim:

We’ll proceed by induction. Our two base cases are:

� If m = 0, ΠH(m) = 1. You have no data, so there’s only one(degenerate) labeling

� If d = 0, ΠH(m) = 1. If you can’t even shatter a single point, then it’s afixed function

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 10 / 11

Page 28: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Induction Step

Assume that it holds for all m′, d ′ for which m′+d ′ <m+d . We are givenH, |S|=m, S = ⟨x1, . . . ,xm⟩, and d is the VC dimension of H.

Build two new hypothesis spaces

Encodes where the extended set has differences on the first m points.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 11

Page 29: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Induction Step

Assume that it holds for all m′, d ′ for which m′+d ′ <m+d . We are givenH, |S|=m, S = ⟨x1, . . . ,xm⟩, and d is the VC dimension of H.

Build two new hypothesis spaces

Encodes where the extended set has differences on the first m points.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 11 / 11

Page 30: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

What is VC dimension of H1 and H2?

� If a set is shattered by H1, then it is also shattered by H

VC-dim(H1)≤ VC-dim(H) = d (7)

� If a set T is shattered by H2, then T ∩{xm} is shattered by H since therewill be two hypotheses in H for every element of H2 by adding xm

VC-dim(H2)≤ d −1 (8)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 12 / 11

Page 31: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

What is VC dimension of H1 and H2?

� If a set is shattered by H1, then it is also shattered by H

VC-dim(H1)≤ VC-dim(H) = d (7)

� If a set T is shattered by H2, then T ∩{xm} is shattered by H since therewill be two hypotheses in H for every element of H2 by adding xm

VC-dim(H2)≤ d −1 (8)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 12 / 11

Page 32: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Bounding Growth Function

|ΠH(S)|=|H1|+ |H2| (9)

≤d∑

i=0

m−1

i

+d−1∑

i=0

m−1

i

(10)

(11)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 11

Page 33: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Bounding Growth Function

|ΠH(S)|=|H1|+ |H2| (9)

≤d∑

i=0

m−1

i

+d−1∑

i=0

m−1

i

(10)

(11)

We can rewrite this as∑d

i=0 (m−1i−1 ) because ( x

−1) = 0.

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 11

Page 34: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Bounding Growth Function

|ΠH(S)|=|H1|+ |H2| (9)

≤d∑

i=0

m−1

i

+d−1∑

i=0

m−1

i

(10)

=d∑

i=0

��

m−1

i

+

m−1

i −1

��

(11)

(12)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 11

Page 35: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Bounding Growth Function

|ΠH(S)|=|H1|+ |H2| (9)

≤d∑

i=0

m−1

i

+d−1∑

i=0

m−1

i

(10)

=d∑

i=0

��

m−1

i

+

m−1

i −1

��

(11)

=d∑

i=0

m

i

(12)

(13)

Pascal’s Triangle

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 11

Page 36: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Bounding Growth Function

|ΠH(S)|=|H1|+ |H2| (9)

≤d∑

i=0

m−1

i

+d−1∑

i=0

m−1

i

(10)

=d∑

i=0

��

m−1

i

+

m−1

i −1

��

(11)

=d∑

i=0

m

i

(12)

=Φd(m) (13)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 13 / 11

Page 37: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Wait a minute . . .

Is this combinatorial expression really O (md)?

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 14 / 11

Page 38: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Generalization Bounds

Combining our previous generalization results with Sauer’s lemma, wehave that for a hypothesis class H with VC dimension d , for any δ > 0 withprobability at least 1−δ, for any h ∈H,

R(h)≤ R̂(h)+

√2d log emd

m+

√ log 1δ

2m(14)

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 15 / 11

Page 39: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Whew!

� We now have some theory down

� We’re now going to see if we can find an algorithm that has good VCdimension

� And works well in practice . . . Support Vector Machines

� In class: more VC dimension examples

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 16 / 11

Page 40: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Whew!

� We now have some theory down

� We’re now going to see if we can find an algorithm that has good VCdimension

� And works well in practice . . .

Support Vector Machines

� In class: more VC dimension examples

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 16 / 11

Page 41: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Whew!

� We now have some theory down

� We’re now going to see if we can find an algorithm that has good VCdimension

� And works well in practice . . . Support Vector Machines

� In class: more VC dimension examples

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 16 / 11

Page 42: Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Whew!

� We now have some theory down

� We’re now going to see if we can find an algorithm that has good VCdimension

� And works well in practice . . . Support Vector Machines

� In class: more VC dimension examples

Machine Learning: Jordan Boyd-Graber | UMD Introduction to Machine Learning | 16 / 11