About functional SIR

60
About functional SIR Victor Picheny, Rémi Servien & Nathalie Villa-Vialaneix [email protected] http://www.nathalievilla.org Journées “Données fonctionnelles” Institut de Mathématiques de Toulouse, June 19th 2017 Nathalie Villa-Vialaneix | SISIR 1/34

Transcript of About functional SIR

Page 1: About functional SIR

About functional SIRVictor Picheny, Rémi Servien & Nathalie Villa-Vialaneix

[email protected]://www.nathalievilla.org

Journées “Données fonctionnelles”Institut de Mathématiques de Toulouse, June 19th 2017

Nathalie Villa-Vialaneix | SISIR 1/34

Page 2: About functional SIR

A joint work of SFCB team

Victor Picheny Rémi Servien NV2

Nathalie Villa-Vialaneix | SISIR 2/34

Page 3: About functional SIR

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 3/34

Page 4: About functional SIR

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 4/34

Page 5: About functional SIR

Introduction

X a functional random variable and Y ∈ R

n i.i.d. realizations of (X ,Y)

Nathalie Villa-Vialaneix | SISIR 5/34

Page 6: About functional SIR

Objectives

variable selection in functional regressionselection of full intervals made of consecutive pointswithout any a priori information on the intervalsfully data-driven procedure

Nathalie Villa-Vialaneix | SISIR 6/34

Page 7: About functional SIR

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Page 8: About functional SIR

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.

Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Page 9: About functional SIR

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Page 10: About functional SIR

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Page 11: About functional SIR

Related works (variable selection in FDA)

LASSO / L1 regularization in linear models[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluationpoints), [Matsui and Konishi, 2011] (selects elements of an expansionbasis)

[Fraiman et al., 2016] (blinding approach usable for various problems:PCA, regression...)

[Gregorutti et al., 2015] adaptation of the importance of variables inrandom forest for groups of variables

[Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and agreedy update of the selected evaluation points to select the mostrelevant evaluation points in a nonparametric framework

However, none of these approach propose to automatically design andselect contiguous sets of variables.

Nathalie Villa-Vialaneix | SISIR 8/34

Page 12: About functional SIR

Related works (variable selection in FDA)

LASSO / L1 regularization in linear models[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluationpoints), [Matsui and Konishi, 2011] (selects elements of an expansionbasis)

[Fraiman et al., 2016] (blinding approach usable for various problems:PCA, regression...)

[Gregorutti et al., 2015] adaptation of the importance of variables inrandom forest for groups of variables

[Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and agreedy update of the selected evaluation points to select the mostrelevant evaluation points in a nonparametric framework

However, none of these approach propose to automatically design andselect contiguous sets of variables.

Nathalie Villa-Vialaneix | SISIR 8/34

Page 13: About functional SIR

Related works (selection of groups of variables)

[James et al., 2009] L1 regularization in linear model with sparsity onderivatives: piecewise constant predictors

[Park et al., 2016] criterion based on a minimization of the overallcorrelation during a greedy segmentation

[Grollemund et al., 2017] Bayesian approach in which a posterioridistribution about informative intervals can be obtained

All are proposed in the framework of the linear model and the second onedoes not use the target variable to define and select relevant intervals.

Our proposal: a semi-parametric (not entirely linear) model which selectsrelevant intervals combined with an automatic procedure to define theintervals.

Nathalie Villa-Vialaneix | SISIR 9/34

Page 14: About functional SIR

Related works (selection of groups of variables)

[James et al., 2009] L1 regularization in linear model with sparsity onderivatives: piecewise constant predictors

[Park et al., 2016] criterion based on a minimization of the overallcorrelation during a greedy segmentation

[Grollemund et al., 2017] Bayesian approach in which a posterioridistribution about informative intervals can be obtained

All are proposed in the framework of the linear model and the second onedoes not use the target variable to define and select relevant intervals.

Our proposal: a semi-parametric (not entirely linear) model which selectsrelevant intervals combined with an automatic procedure to define theintervals.

Nathalie Villa-Vialaneix | SISIR 9/34

Page 15: About functional SIR

Related works (selection of groups of variables)

[James et al., 2009] L1 regularization in linear model with sparsity onderivatives: piecewise constant predictors

[Park et al., 2016] criterion based on a minimization of the overallcorrelation during a greedy segmentation

[Grollemund et al., 2017] Bayesian approach in which a posterioridistribution about informative intervals can be obtained

All are proposed in the framework of the linear model and the second onedoes not use the target variable to define and select relevant intervals.

Our proposal: a semi-parametric (not entirely linear) model which selectsrelevant intervals combined with an automatic procedure to define theintervals.

Nathalie Villa-Vialaneix | SISIR 9/34

Page 16: About functional SIR

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 10/34

Page 17: About functional SIR

SIR in multidimensional framework

SIR: a semi-parametric regression model for X ∈ Rp

Y = F(aT1 X , . . . , aT

d X , ε)

for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and ε, an

error, independant from X .

Standard assumption for SIR

Y y X | PA (X)

in which A is the so-called EDR space, spanned by (ak )k=1,...,d .

SIR is the regression extension of Linear Discriminant Analysis.

Nathalie Villa-Vialaneix | SISIR 11/34

Page 18: About functional SIR

SIR in multidimensional framework

SIR: a semi-parametric regression model for X ∈ Rp

Y = F(aT1 X , . . . , aT

d X , ε)

for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and ε, an

error, independant from X .

Standard assumption for SIR

Y y X | PA (X)

in which A is the so-called EDR space, spanned by (ak )k=1,...,d .

SIR is the regression extension of Linear Discriminant Analysis.

Nathalie Villa-Vialaneix | SISIR 11/34

Page 19: About functional SIR

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Page 20: About functional SIR

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Page 21: About functional SIR

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Page 22: About functional SIR

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Page 23: About functional SIR

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Page 24: About functional SIR

SIR in large dimensions: problem

In large dimension (or in Functional Data Analysis), n < p and Σ is

ill-conditionned and does not have an inverse⇒ Z = (X − InXT

)Σ−1/2 cannot be computed.

Different solutions have been proposed in the litterature based on:

prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in theframework of FDA)

regularization (ridge...)[Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to thegeneralized eigendecomposition problem Γa = λ(Σ + µ2I)a

sparse SIR[Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005]

QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm

Nathalie Villa-Vialaneix | SISIR 13/34

Page 25: About functional SIR

SIR in large dimensions: problem

In large dimension (or in Functional Data Analysis), n < p and Σ is

ill-conditionned and does not have an inverse⇒ Z = (X − InXT

)Σ−1/2 cannot be computed.

Different solutions have been proposed in the litterature based on:

prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in theframework of FDA)

regularization (ridge...)[Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to thegeneralized eigendecomposition problem Γa = λ(Σ + µ2I)a

sparse SIR[Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005]

QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm

Nathalie Villa-Vialaneix | SISIR 13/34

Page 26: About functional SIR

SIR in large dimensions: sparse versions

Specific issue to introduce sparsity in SIRSparsity on a multiple-index model: most authors use shrinkageapproaches or sparsity on a single-index model and depletion (not shown)

First version: Li and Yin (2008) based on the regression formulationI Pro : Sparsity common to all dimensions dI Cons : Minimization problem with dependent variables in Rp

Second version: Li and Nachtsheim (2008) based on the correlationformulation

I Pro : Minimization problem with independent variables in Rd

I Cons : Sparsity different in all dimensions d

Nathalie Villa-Vialaneix | SISIR 14/34

Page 27: About functional SIR

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .

Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.

Rk: The solution is shown to satisfy φ(y) = aTj E(X |Y = y) and aj is

also obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Page 28: About functional SIR

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...

SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.

Rk: The solution is shown to satisfy φ(y) = aTj E(X |Y = y) and aj is

also obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Page 29: About functional SIR

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.

Rk: The solution is shown to satisfy φ(y) = aTj E(X |Y = y) and aj is

also obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Page 30: About functional SIR

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.Rk: The solution is shown to satisfy φ(y) = aT

j E(X |Y = y) and aj isalso obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Page 31: About functional SIR

SIR in large dimensions: sparse versions

First version: sparse penalization of the ridge solutionIf (A , C) are the solutions of the ridge SIR,[Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution byminimizing

Es,1(α) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣDiag(α)A Ch

∥∥∥∥2+ µ1‖α‖L1

(regression formulation of SIR)

Nathalie Villa-Vialaneix | SISIR 16/34

Page 32: About functional SIR

SIR in large dimensions: sparse versions

Second version: [Li and Nachtsheim, 2008] derive the sparse optimizationproblem from the correlation formulation of SIR:

minas

j

n∑i=1

[Paj (X |yi) − (as

j )T xi

]2+ µ1,j‖as

j ‖L1 ,

in which Paj is the projection of E(X |Y = yi) = Xh onto the space spannedby the solution of the ridge problem.

Nathalie Villa-Vialaneix | SISIR 16/34

Page 33: About functional SIR

Characteristics of the different approaches and possibleextensions

[Li and Yin, 2008] [Li and Nachtsheim, 2008]

sparsity on shrinkage coefficients estimatesnb optimization pb 1 dsparsity common to all dims specific to each dim

Nathalie Villa-Vialaneix | SISIR 17/34

Page 34: About functional SIR

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 18/34

Page 35: About functional SIR

SIR in large dimensions: our sparse version

Background: Back to the functional setting, we suppose that t1, ..., tp aresplit into D intervals I1, ..., ID .

Based on the minimization problem of Li and Nachtsheim (2008)

Our adaptation: Sparsity under the intervals using α = (α1, . . . , αD)

∀l = 1, . . . , p, asjl = αk ajl for k such that tj ∈ Ik .

the sparsity constraint is put on α and not directly on asj

α are made identical for all dimensions of the projection j = 1, . . . , d

Nathalie Villa-Vialaneix | SISIR 19/34

Page 36: About functional SIR

SIR in large dimensions: our sparse version

Li and Nachtsheim (2008) (LASSO):

minas

j

n∑i=1

‖Paj (X |yi) − (asj )T xi‖

2 + µ1,j‖asj ‖L1 ,

in which Paj is the projection of E(X |Y = yi) = Xh (for h such that yi inslide h) onto the space spanned by the aj .Our adaptation:

α = arg minα∈RD

d∑j=1

n∑i=1

‖Paj (X |yi) − (Λ(α) aj)> xi‖

2 + µ1‖α‖L1

with ∀l = 1, . . . , p, asjl = αk ajl for k such that tj ∈ Ik and

Λ(α) = Diag (α1I|I1 |, . . . , αDI|ID |) ∈ Mp×p .

Nathalie Villa-Vialaneix | SISIR 20/34

Page 37: About functional SIR

Summary : SISIR: a two step approach

First step: Solve the projection problem (using SIR and L2-regularization ofΣ) that provides the estimates (aj)j∈{1,...,d} of the vectors spanning the EDRspace.

Second step: Sparsity under the D intervals using α = (α1, . . . , αD)solving a LASSO problem : handles functional setting by penalizing entireintervals and not just isolated points.

Nathalie Villa-Vialaneix | SISIR 21/34

Page 38: About functional SIR

SISIR: Characteristics

uses the approach based on the correlation formulation (because thedimensionality of the optimization problem is smaller);

uses a shrinkage approach and optimizes shrinkage coefficients in asingle optimization problem;

handles functional setting by penalizing entire intervals and not justisolated points.

Nathalie Villa-Vialaneix | SISIR 22/34

Page 39: About functional SIR

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }

2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Page 40: About functional SIR

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I along the regularization path, select three values for µ1:

P% of thecoefficients are zero, P% of the coefficients are non zero, best GCV.define: D− (“strong zeros”) and D+ (“strong non zeros”)

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Page 41: About functional SIR

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I along the regularization path, select three values for µ1: P% of thecoefficients are zero, P% of the coefficients are non zero, best GCV.define: D− (“strong zeros”) and D+ (“strong non zeros”)

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Page 42: About functional SIR

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)I merge consecutive “strong zeros” (or “strong non zeros”) or “strong

zeros” (resp. “strong non zeros”) separated by a few numbers ofintervals which are of undetermined type.

Until no more iterations can be performed.

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Page 43: About functional SIR

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)I merge consecutive “strong zeros” (or “strong non zeros”) or “strong

zeros” (resp. “strong non zeros”) separated by a few numbers ofintervals which are of undetermined type.

Until no more iterations can be performed.3 Output: Collection of models (first with p intervals, last with 1),M∗D

(optimal for GCV) and corresponding GCVD versus D (number ofintervals).

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Page 44: About functional SIR

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)I merge consecutive “strong zeros” (or “strong non zeros”) or “strong

zeros” (resp. “strong non zeros”) separated by a few numbers ofintervals which are of undetermined type.

Until no more iterations can be performed.3 Output: Collection of models (first with p intervals, last with 1),M∗D

(optimal for GCV) and corresponding GCVD versus D (number ofintervals).

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Page 45: About functional SIR

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 24/34

Page 46: About functional SIR

Simulation frameworkData generated with:

X(t) a Gaussian process with mean µ(t) = −5 + 4t − 4t2 and aMatern covariance

aj = sin(

t(2+j)π2 −

(j−1)π3

)IIj (t)

Y =∑d

j=1 log∣∣∣〈X , aj〉

∣∣∣one model: (M1), d = 1, I1 = [0.2, 0.4].

Nathalie Villa-Vialaneix | SISIR 25/34

Page 47: About functional SIR

Definition of the intervalsD = p = 200 (initial state=LASSO) D = 142

D = 41 D = 5

Nathalie Villa-Vialaneix | SISIR 26/34

Page 48: About functional SIR

Second model(M2): d = 3 and I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78].

Nathalie Villa-Vialaneix | SISIR 27/34

Page 49: About functional SIR

Second model

SISIR standard sparse

Nathalie Villa-Vialaneix | SISIR 28/34

Page 50: About functional SIR

Tecator dataset

relevant intervals

easily interpretable

good MSE

Nathalie Villa-Vialaneix | SISIR 29/34

Page 51: About functional SIR

Sunflower dataset

climatic time series (between 1975 and 2012 in France)daily measure from April to OctoberX=evaportranspiration, Y=yield, n = 111, p = 309

Nathalie Villa-Vialaneix | SISIR 30/34

Page 52: About functional SIR

Sunflower dataset

only two points identified outside the intervalfocus on the second half of the intervalmatches expert knowledge

Nathalie Villa-Vialaneix | SISIR 31/34

Page 53: About functional SIR

Conclusion

SI-SIR:

sparse dimension reduction model adapted to functional framework

fully automated definition of relevant intervals in the range of thepredictors

Package SISIR available on CRAN athttps://cran.r-project.org/package=SISIR.

Perspectives

adaptation to multiple X

application to large-scale real data (agricultural application:X={temperature,rainfall ...}, Y={yield})

replace CV criterion?

Nathalie Villa-Vialaneix | SISIR 32/34

Page 54: About functional SIR

Nathalie Villa-Vialaneix | SISIR 33/34

Page 55: About functional SIR

Aneiros, G. and Vieu, P. (2014).Variable in infinite-dimensional problems.Statistics and Probability Letters, 94:12–20.

Bernard-Michel, C., Gardes, L., and Girard, S. (2008).A note on sliced inverse regression with regularizations.Biometrics, 64(3):982–986.

Coudret, R., Liquet, B., and Saracco, J. (2014).Comparison of sliced inverse regression aproaches for undetermined cases.Journal de la Société Française de Statistique, 155(2):72–96.

Fauvel, M., Deschene, C., Zullo, A., and Ferraty, F. (2015).Fast forward feature selection of hyperspectral images for classification with Gaussian mixture models.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):2824–2831.

Ferraty, F. and Hall, P. (2015).An algorithm for nonlinear, nonparametric model choice and prediction.Journal of Computational and Graphical Statistics, 24(3):695–714.

Ferraty, F., Hall, P., and Vieu, P. (2010).Most-predictive design points for functiona data predictors.Biometrika, 97(4):807–824.

Ferré, L. and Yao, A. (2003).Functional sliced inverse regression analysis.Statistics, 37(6):475–488.

Fraiman, R., Gimenez, Y., and Svarc, M. (2016).Feature selection for functional data.Journal of Multivariate Analysis, 146:191–208.

Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015).Grouped variable importance with random forests and application to multiple functional data analysis.

Nathalie Villa-Vialaneix | SISIR 33/34

Page 56: About functional SIR

Computational Statistics and Data Analysis, 90:15–35.

Grollemund, P., Abraham, C., Baragatti, M., and Pudlo, P. (2017).Bayesian functional linear regression with sparse step functions.Preprint.

James, G., Wang, J., and Zhu, J. (2009).Functional linear regression that’s interpretable.Annals of Statistics, 37(5A):2083–2108.

Li, L. and Nachtsheim, C. (2008).Sparse sliced inverse regression.Technometrics, 48(4):503–510.

Li, L. and Yin, X. (2008).Sliced inverse regression with regularizations.Biometrics, 64(1):124–131.

Liquet, B. and Saracco, J. (2012).A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches.Computational Statistics, 27(1):103–125.

Matsui, H. and Konishi, S. (2011).Variable selection for functional regression models via the l1 regularization.Computational Statistics and Data Analysis, 55(12):3304–3310.

Ni, L., Cook, D., and Tsai, C. (2005).A note on shrinkage sliced inverse regression.Biometrika, 92(1):242–247.

Park, A., Aston, J., and Ferraty, F. (2016).Stable and predictive functional domain selection with application to brain images.Preprint arXiv 1606.02186.

Nathalie Villa-Vialaneix | SISIR 34/34

Page 57: About functional SIR

Parameter estimation

H (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);

µ2 and d (ridge estimate A ):I L -fold CV for µ2 (for a d0 large enough)

I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34

Page 58: About functional SIR

Parameter estimationH (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);µ2 and d (ridge estimate A ):

I L -fold CV for µ2 (for a d0 large enough) Note that GCV as described in[Li and Yin, 2008] can not be used since the current version of the L2

penalty involves the use of an estimate of Σ−1.

I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34

Page 59: About functional SIR

Parameter estimation

H (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);

µ2 and d (ridge estimate A ):I L -fold CV for µ2 (for a d0 large enough)I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34

Page 60: About functional SIR

Parameter estimation

H (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);

µ2 and d (ridge estimate A ):I L -fold CV for µ2 (for a d0 large enough)I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34