Thème Application des algorithmes de machine Learning pour ...
Machine Learning - Introduction to Machine Learningmpetrik/teaching/intro_ml_17_files/class1.pdf ·...
Transcript of Machine Learning - Introduction to Machine Learningmpetrik/teaching/intro_ml_17_files/class1.pdf ·...
Machine LearningIntroduction to Machine Learning
Marek Petrik
January 26, 2017
Some of the figures in this presentation are taken from ”An Introduction to Statistical Learning, with applications in R”(Springer, 2013) with permission from the authors: G. James, D. Wien, T. Hastie and R. Tibshirani
What is machine learning?Arthur Samuel (1959, IBM):
Field of study that gives computers the ability to learnwithout being explicitly programmed
The rise of machine learningICML: International Conference on Machine Learning
2009
500 aendees
2015
1600 aendees
2016
3300 aendees
Data is everywhere!
IBM Watson: Computers Beat Humans in Jeopardy
Fair use,hps://en.wikipedia.org/w/index.php?curid=31142331
AlphaGo: Computers Beat Humans in Go
Photograph by Saran Poroong Gey Images/iStockphoto
Personalized Product Recommendations
Online retailers mine purchase history to make recommendations
Identifying Crops from Space
Identify the type of crop / vegetation / urban space from satelliteobservations
Other Applications
1. Health-care: Identify risks of geing a disease
2. Health-care: Predict eectiveness of a treatment
3. Detect spam in emails
4. Recognize hand-wrien text
5. Speech recognition (speech to text)
6. Machine translation
7. Predict probability of an employee leaving
Activity: any other applications?
This Course: Introduction to Machine Learning
I Build a foundation for practice and research in MLI Basic machine learning concepts: max likelihood, cross
validationI Fundamental machine learning techniques: regression,
model-selection, deep learningI Educational goals:
1. How to apply basic methods2. Reveal what happens inside3. What are the pitfalls4. Expand understanding of linear algebra, statistics, and
optimization
Course Overview
I Grading:
50% 6 Assignments15% Midterm exam30% Final exam15% Class project
I Assignments: posted on myCourses and the websiteI Discuss questions:
https://piazza.com/unh/spring2017/cs780cs880
I Programming language: R (or Python, but discouraged)
Course Texts
I Textbooks:ISL James, G., Wien, D., Hastie, T., & Tibshirani, R. (2013). An
Introduction to Statistical LearningELS Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of
Statistical Learning. Springer Series in Statistics (2nd ed.)I Other sources:
DL Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning.[online pdf on github]
LA Strang, G. Introduction to Linear Algebra. (2016)CO Boyd, S., & Vandenberghe, L. (2004). Convex Optimization.
[online pdf]RL Suon, R. S., & Barto, A. (2012). Reinforcement learning. 2nd
edition [online pdf dra]
Class Plan I
Date Day Topic
Jan 24 Tue Statistical learning and R languageJan 26 Thu Linear regression with one variable
Jan 31 Tue Linear regression with multiple variablesFeb 02 Thu No class
Feb 07 Tue Classification and logistic regression 1Feb 09 Thu Classification, naive Bayes and LDA
Feb 14 Tue Linear algebra for machine learning: reviewMar 16 Thu Linear algebra and optimization
Feb 21 Tue Overfiing and resampling methodsFeb 23 Thu Cross-validation and bootstrapping
Feb 28 Tue Linear model selection, priorsMar 02 Thu Linear model selection and regularization
Class Plan IIMar 07 Tue Midterm reviewMar 09 Thu Midterm exam; material until 2/23
Mar 14 Tue Spring break, no classMar 16 Thu Spring break, no class
Mar 21 Tue Building nonlinear featuresMar 23 Thu Nearest neighbor methods and GAMs
Mar 28 Tue Tree-based methods and boostingMar 30 Thu Support vector machines and other techniques
Apr 04 Tue Unsupervised learning, PCAApr 06 Thu Unsupervised learning, k-means
Apr 11 Tue Reinforcement learningApr 13 Thu Neural networks and deep learning
Apr 18 Tue Neural networks and deep learningApr 20 Thu Big data and machine learning
Class Plan III
Apr 25 Tue Machine learning in practiceApr 27 Thu Project presentations
May 02 Tue Guest speakerMay 04 Thu Final exam review
May 11-17 ? Final exam
What is Machine LearningI Discover unknown function f :
Y = f(X)
I X = set of features, or inputsI Y = target, or response
0 50 100 200 300
51
01
52
02
5
TV
Sa
les
0 10 20 30 40 50
51
01
52
02
5
Radio
Sa
les
0 20 40 60 80 100
51
01
52
02
5Newspaper
Sa
les
Sales = f(TV,Radio,Newspaper)
Notation
Y = f(X) = f(X1, X2, X3)
Sales = f(TV,Radio,Newspaper)
I Y = Sales
I X1 = TV
I X2 = Radio
I X3 = Newspaper
Vector:
X =(X1, X2, X3
)
Dataset
Sales = f(TV,Radio,Newspaper)
0 50 100 200 300
51
01
52
02
5
TV
Sa
les
0 10 20 30 40 50
51
01
52
02
5
Radio
Sa
les
0 20 40 60 80 100
51
01
52
02
5
Newspaper
Sa
les
Dataset:
Y X1 X2 X3
10 101 20 3520 66 41 8511 101 43 7825 25 10 615 310 51 11
rows are samples
Errors in Machine Learning: World is Noisy
I World is too complex to model preciselyI Many features are not captured in data setsI Need to allow for errors ε in f :
Y = f(X) + ε
Machine Learning Algorithm
I Input:Training data-set with features and targets
I Output:Prediction function f
Parametric Prediction Methods
Years of Education
Sen
iorit
y
Incom
e
Linear models (linear regression)
income = f(education, seniority) = β0+β1×education+β2×seniority
Why Estimate f?
0 50 100 200 300
51
01
52
02
5
TV
Sa
les
0 10 20 30 40 505
10
15
20
25
Radio
Sa
les
0 20 40 60 80 100
51
01
52
02
5
Newspaper
Sa
les
1. Prediction: Make predictions about future: Best medium mixto spend ad money?
2. Inference: Understand the relationship: What kind of adswork? Why?
Prediction or Inference?
Application Prediction InferenceIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Prediction or Inference?
Application Prediction InferenceIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Statistical View of Machine Learning
I Probability space Ω: Set of all adultsI Random variable: X(ω) = R: Years of educationI Random variable: Y (ω) = R: Salary
10 12 14 16 18 20 22
20
30
40
50
60
70
80
Years of Education
Inco
me
10 12 14 16 18 20 22
20
30
40
50
60
70
80
Years of Education
Inco
me
How Good are Predictions?
I Learned function fI Test data: (x1, y1), (x2, y2), . . .
I Mean Squared Error (MSE):
MSE =1
n
n∑i=1
(yi − f(xi))2
I This is the estimate of:
MSE = E[(Y − f(X))2] =1
|Ω|∑ω∈Ω
(Y (ω)− f(X(ω)))2
I Important: Samples xi are i.i.d.
Do We Need Test Data?I Why not just test on the training data?
0 20 40 60 80 100
24
68
10
12
X
Y
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
Me
an
Sq
ua
red
Err
or
I Flexibility is the degree of polynomial being fitI Gray line: training error, red line: testing error
Bias-Variance Decomposition
Y = f(X) + ε
Mean Squared Error can be decomposed as:
MSE = E(Y − f(X))2 = Var(f(X))︸ ︷︷ ︸Variance
+ (E(f(X)))2︸ ︷︷ ︸Bias
+ Var(ε)
I Bias: How well would method work with infinite dataI Variance: How much does output change with dierent data
sets
Bias-Variance Trade-o
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
2 5 10 20
05
10
15
20
Flexibility
MSEBiasVar
Types of Function f
Regression: continuous target
f : X → R
Years of Education
Sen
iorit
y
Incom
e
Classification: discrete target
f : X → 1, 2, 3, . . . , k
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o oo
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2
Regression or Classification?
Application Regression ClassificationIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Regression or Classification?
Application Regression ClassificationIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Error Rate In Classification
I Learned function fI Test data: (x1, y1), (x2, y2), . . .
1
n
n∑i=1
I(yi 6= f(xi))
I Bayes classifier: assign each observation to the most likely class
f(x) = Pr[Y = j | X = x]
I Bayes classifier would require known true function fI Lower bound on the error
KNN: K-Nearest Neighbors
I Bayes classifier can only predict the values x in the training setI Idea: Use similar training points when making predictions
o
o
o
o
o
oo
o
o
o
o
o o
o
o
o
o
oo
o
o
o
o
o
I Non-parametric method (unlike regression)
KNN: Choosing k
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
KNN: K=1 KNN: K=100
KNN: Training and Test Errors
0.01 0.02 0.05 0.10 0.20 0.50 1.00
0.0
00.0
50.1
00.1
50.2
0
1/K
Err
or
Rate
Training Errors
Test Errors
R Language
I Download and install R: http://cran.r-project.prgI Try using RStudio as an R IDEI Read the R lab: ISL 2.3I Use Piazza for questions
Alternatives to R
I Python (+ Numpy + Pandas + Matplotlib + Scikits Learn +Scipy)More popular, more flexible, but less mature
I MATLABLinear algebra and control focus, but a lot of ML toolkits
I JuliaNew technical computing language, “ walks like python runslike c”, but least mature