Lec18-Perceptron
-
Upload
khalifa-bakkar -
Category
Documents
-
view
217 -
download
0
Transcript of Lec18-Perceptron
-
7/28/2019 Lec18-Perceptron
1/13
Linear Discriminators
Chapter 20
Only relevant parts
-
7/28/2019 Lec18-Perceptron
2/13
Concerns
Generalization Accuracy
Efficiency
Noise
Irrelevant features
Generality: when does this work?
-
7/28/2019 Lec18-Perceptron
3/13
Linear Model Let f1, fn be the feature values of an example. Let class be denoted {+1, -1}.
Define f0 = -1. (bias weight)
Linear model defines weights w0,w1,..wn. -w0 is the threshold
Classification rule: If w0*f0+w1*f1..+wn*fn> 0, predict class + else
predict class -.
Briefly: W*F>0 where * is inner product ofweight vector and feature weights and F has beenaugmented with extra 1.
-
7/28/2019 Lec18-Perceptron
4/13
Augmentation Trick
Suppose data defined features f1 and f2.
2* f1 + 3*f2 > 4 is classifier
Equivalently: * > 0 Mapping data to allows
learning/representing threshold as just
another featuer.
Mapping data into higher dimensions is key
idea behind SVMs
-
7/28/2019 Lec18-Perceptron
5/13
Mapping to enable Linear Separation
Let xi be m vectors in R^N.
Map xi into R^{N+M} by xi ->
where 1 in n+i position.
For any labelling of xi by classes +/-, the
embedding makes data linearly separable.
Define wi = 0 i
-
7/28/2019 Lec18-Perceptron
6/13
Representational Power
Or of n features
Wi = 1, threshold = 0
And of n features
Wi = 1 threshold = n -1
K of n features (prototype)
Wi =1 threshold = k -1
Cant do XOR Combining linear threshold units yields any
boolean function.
-
7/28/2019 Lec18-Perceptron
7/13
-
7/28/2019 Lec18-Perceptron
8/13
Classical Perceptron
Theorem: If concept linearly separable, thenalgorithm finds a solution.
Training time can be exponential in numberof features.
Epoch is single pass through entire data.
Convergence can take exponentially many
epochs, but guaranteed to work. If |xi|
-
7/28/2019 Lec18-Perceptron
9/13
Hill-Climbing Search
This is an optimization problem.
The solution is by hill-climbing so there is no
guarantee of finding the optimal solution.
While derivates tell you the direction (the negative
gradient) they do not tell you how much to change
each Xi.
On the plus side it is fast. On the negative side, no guarantee of separation
-
7/28/2019 Lec18-Perceptron
10/13
Hill-climbing View
Goal: minimize Squared-error = Err^2.
Let class yi be 1 or -1.
Let Err = sum(W*XiYi) where Xi is ithexample.
This is a function only of the weights.
Use Calculus; take partial derivates wrt Wj.
To move to lower value, move in direction of
negative gradient, i.e.
change in Xi is -2*Err*Xj
-
7/28/2019 Lec18-Perceptron
11/13
Support Vector Machine
Goal: maximize the margin.
Assuming the line separates the data, the
margin is the minimum of the closest
positive and negative example to the line.
Good News: This can be solved by
quadratic program.
Implemented in Weka as SOM.
If not linearly separable, SVM will add
more features.
-
7/28/2019 Lec18-Perceptron
12/13
If not Linearly Separable
1. Add more nodes: Neural Nets
1. Can Represent any boolean function: why?
2. No guarantees about learning
3. Slow4. Incomprehensible
2. Add more features: SVM
1. Can represent any boolean function
2. Learning guarantees3. Fast
4. Semi-comprehensible
-
7/28/2019 Lec18-Perceptron
13/13
Adding features
Suppose pt (x,y) is positive if it lies in theunit disk else negative.
Clearly very unlinearly separable Map (x,y) -> (x,y, x^2+y^2)
Now in 3-space, easily separable.
This works for any learning algorithm, butSVM will almost do it for you. (setparameters).