About functional SIR

Post on 23-Jan-2018

133 views 1 download

Transcript of About functional SIR

About functional SIRVictor Picheny, Rémi Servien & Nathalie Villa-Vialaneix

nathalie.villa@toulouse.inra.frhttp://www.nathalievilla.org

Journées “Données fonctionnelles”Institut de Mathématiques de Toulouse, June 19th 2017

Nathalie Villa-Vialaneix | SISIR 1/34

A joint work of SFCB team

Victor Picheny Rémi Servien NV2

Nathalie Villa-Vialaneix | SISIR 2/34

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 3/34

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 4/34

Introduction

X a functional random variable and Y ∈ R

n i.i.d. realizations of (X ,Y)

Nathalie Villa-Vialaneix | SISIR 5/34

Objectives

variable selection in functional regressionselection of full intervals made of consecutive pointswithout any a priori information on the intervalsfully data-driven procedure

Nathalie Villa-Vialaneix | SISIR 6/34

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.

Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Question and mathematical framework

A functional regression problem: X : random variable (functional) & Y :random real variable

E(Y |X)?

Data: n i.i.d. observations (xi , yi)i=1,...,n.xi is not perfectly known but sampled at (fixed) points

xi = (xi(t1), . . . , xi(tp))T ∈ Rp . We denote: X =

xT

1...

xTn

.Question: Find a model that is easily interpretable and points out relevantintervals for the prediction within the definition domain of X .

Method: Do not expand X on a functional basis but use the fact that theentries of the digitized function xi are ordered in a natural way.

Nathalie Villa-Vialaneix | SISIR 7/34

Related works (variable selection in FDA)

LASSO / L1 regularization in linear models[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluationpoints), [Matsui and Konishi, 2011] (selects elements of an expansionbasis)

[Fraiman et al., 2016] (blinding approach usable for various problems:PCA, regression...)

[Gregorutti et al., 2015] adaptation of the importance of variables inrandom forest for groups of variables

[Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and agreedy update of the selected evaluation points to select the mostrelevant evaluation points in a nonparametric framework

However, none of these approach propose to automatically design andselect contiguous sets of variables.

Nathalie Villa-Vialaneix | SISIR 8/34

Related works (variable selection in FDA)

LASSO / L1 regularization in linear models[Ferraty et al., 2010, Aneiros and Vieu, 2014] (isolated evaluationpoints), [Matsui and Konishi, 2011] (selects elements of an expansionbasis)

[Fraiman et al., 2016] (blinding approach usable for various problems:PCA, regression...)

[Gregorutti et al., 2015] adaptation of the importance of variables inrandom forest for groups of variables

[Fauvel et al., 2015, Ferraty and Hall, 2015] cross validation and agreedy update of the selected evaluation points to select the mostrelevant evaluation points in a nonparametric framework

However, none of these approach propose to automatically design andselect contiguous sets of variables.

Nathalie Villa-Vialaneix | SISIR 8/34

Related works (selection of groups of variables)

[James et al., 2009] L1 regularization in linear model with sparsity onderivatives: piecewise constant predictors

[Park et al., 2016] criterion based on a minimization of the overallcorrelation during a greedy segmentation

[Grollemund et al., 2017] Bayesian approach in which a posterioridistribution about informative intervals can be obtained

All are proposed in the framework of the linear model and the second onedoes not use the target variable to define and select relevant intervals.

Our proposal: a semi-parametric (not entirely linear) model which selectsrelevant intervals combined with an automatic procedure to define theintervals.

Nathalie Villa-Vialaneix | SISIR 9/34

Related works (selection of groups of variables)

[James et al., 2009] L1 regularization in linear model with sparsity onderivatives: piecewise constant predictors

[Park et al., 2016] criterion based on a minimization of the overallcorrelation during a greedy segmentation

[Grollemund et al., 2017] Bayesian approach in which a posterioridistribution about informative intervals can be obtained

All are proposed in the framework of the linear model and the second onedoes not use the target variable to define and select relevant intervals.

Our proposal: a semi-parametric (not entirely linear) model which selectsrelevant intervals combined with an automatic procedure to define theintervals.

Nathalie Villa-Vialaneix | SISIR 9/34

Related works (selection of groups of variables)

[James et al., 2009] L1 regularization in linear model with sparsity onderivatives: piecewise constant predictors

[Park et al., 2016] criterion based on a minimization of the overallcorrelation during a greedy segmentation

[Grollemund et al., 2017] Bayesian approach in which a posterioridistribution about informative intervals can be obtained

All are proposed in the framework of the linear model and the second onedoes not use the target variable to define and select relevant intervals.

Our proposal: a semi-parametric (not entirely linear) model which selectsrelevant intervals combined with an automatic procedure to define theintervals.

Nathalie Villa-Vialaneix | SISIR 9/34

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 10/34

SIR in multidimensional framework

SIR: a semi-parametric regression model for X ∈ Rp

Y = F(aT1 X , . . . , aT

d X , ε)

for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and ε, an

error, independant from X .

Standard assumption for SIR

Y y X | PA (X)

in which A is the so-called EDR space, spanned by (ak )k=1,...,d .

SIR is the regression extension of Linear Discriminant Analysis.

Nathalie Villa-Vialaneix | SISIR 11/34

SIR in multidimensional framework

SIR: a semi-parametric regression model for X ∈ Rp

Y = F(aT1 X , . . . , aT

d X , ε)

for a1, . . . , ad ∈ Rp (to be estimated), F : Rd+1 → R, unknown, and ε, an

error, independant from X .

Standard assumption for SIR

Y y X | PA (X)

in which A is the so-called EDR space, spanned by (ak )k=1,...,d .

SIR is the regression extension of Linear Discriminant Analysis.

Nathalie Villa-Vialaneix | SISIR 11/34

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

Estimation

Equivalence between SIR and eigendecomposition

A is included in the space spanned by the first d Σ-orthogonaleigenvectors of the generalized eigendecomposition problem:Γa = λΣa, Σ covariance matrix of X and Γ covariance matrix ofE(X |Y)

Estimation (when n > p)

compute X = 1n∑n

i=1 xi and Σ = 1n XT (X − X)

split the range of Y into H different slices: τ1, ... τH and estimateE(X |Y) =

(1nh

∑i: yi∈τh

xi

)h=1,...,H

, with nh = |{i : yi ∈ τh}|, in each slice,

to obtain an estimate of Γ

solve the eigendecomposition problem Γa = λΣa and obtain theeigenvectors a1, . . . , ad

Nathalie Villa-Vialaneix | SISIR 12/34

SIR in large dimensions: problem

In large dimension (or in Functional Data Analysis), n < p and Σ is

ill-conditionned and does not have an inverse⇒ Z = (X − InXT

)Σ−1/2 cannot be computed.

Different solutions have been proposed in the litterature based on:

prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in theframework of FDA)

regularization (ridge...)[Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to thegeneralized eigendecomposition problem Γa = λ(Σ + µ2I)a

sparse SIR[Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005]

QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm

Nathalie Villa-Vialaneix | SISIR 13/34

SIR in large dimensions: problem

In large dimension (or in Functional Data Analysis), n < p and Σ is

ill-conditionned and does not have an inverse⇒ Z = (X − InXT

)Σ−1/2 cannot be computed.

Different solutions have been proposed in the litterature based on:

prior dimension reduction (e.g., PCA) [Ferré and Yao, 2003] (in theframework of FDA)

regularization (ridge...)[Li and Yin, 2008, Bernard-Michel et al., 2008]: equivalent to thegeneralized eigendecomposition problem Γa = λ(Σ + µ2I)a

sparse SIR[Li and Yin, 2008, Li and Nachtsheim, 2008, Ni et al., 2005]

QZ-SIR [Coudret et al., 2014]: uses a method similar to QR-algorithm

Nathalie Villa-Vialaneix | SISIR 13/34

SIR in large dimensions: sparse versions

Specific issue to introduce sparsity in SIRSparsity on a multiple-index model: most authors use shrinkageapproaches or sparsity on a single-index model and depletion (not shown)

First version: Li and Yin (2008) based on the regression formulationI Pro : Sparsity common to all dimensions dI Cons : Minimization problem with dependent variables in Rp

Second version: Li and Nachtsheim (2008) based on the correlationformulation

I Pro : Minimization problem with independent variables in Rd

I Cons : Sparsity different in all dimensions d

Nathalie Villa-Vialaneix | SISIR 14/34

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .

Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.

Rk: The solution is shown to satisfy φ(y) = aTj E(X |Y = y) and aj is

also obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...

SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.

Rk: The solution is shown to satisfy φ(y) = aTj E(X |Y = y) and aj is

also obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.

Rk: The solution is shown to satisfy φ(y) = aTj E(X |Y = y) and aj is

also obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

Equivalent formulationsSIR as a regression problem [Li and Yin, 2008] shows that SIR isequivalent to the (double) minimization of

E(A ,C) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣACh

∥∥∥∥2

for Xh = 1nh

∑i: yi∈τh

, A a (p × d)-matrix and C a vector in Rd .Rk: Given A , C is obtained as the solution of an ordinary least squareproblem...SIR as a Canonical Correlation problem [Li and Nachtsheim, 2008]shows that SIR rewrites as the double optimisation problemmaxaj ,φ Cor(φ(Y), aT

j X), where φ is any function R→ R and (aj)j areΣ-orthonormal.Rk: The solution is shown to satisfy φ(y) = aT

j E(X |Y = y) and aj isalso obtained as the solution of the mean square error problem:

minajE

(φ(Y) − aT

j X)2

Nathalie Villa-Vialaneix | SISIR 15/34

SIR in large dimensions: sparse versions

First version: sparse penalization of the ridge solutionIf (A , C) are the solutions of the ridge SIR,[Ni et al., 2005, Li and Yin, 2008] propose to shrink this solution byminimizing

Es,1(α) =H∑

h=1

ph

∥∥∥∥(Xh − X)− ΣDiag(α)A Ch

∥∥∥∥2+ µ1‖α‖L1

(regression formulation of SIR)

Nathalie Villa-Vialaneix | SISIR 16/34

SIR in large dimensions: sparse versions

Second version: [Li and Nachtsheim, 2008] derive the sparse optimizationproblem from the correlation formulation of SIR:

minas

j

n∑i=1

[Paj (X |yi) − (as

j )T xi

]2+ µ1,j‖as

j ‖L1 ,

in which Paj is the projection of E(X |Y = yi) = Xh onto the space spannedby the solution of the ridge problem.

Nathalie Villa-Vialaneix | SISIR 16/34

Characteristics of the different approaches and possibleextensions

[Li and Yin, 2008] [Li and Nachtsheim, 2008]

sparsity on shrinkage coefficients estimatesnb optimization pb 1 dsparsity common to all dims specific to each dim

Nathalie Villa-Vialaneix | SISIR 17/34

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 18/34

SIR in large dimensions: our sparse version

Background: Back to the functional setting, we suppose that t1, ..., tp aresplit into D intervals I1, ..., ID .

Based on the minimization problem of Li and Nachtsheim (2008)

Our adaptation: Sparsity under the intervals using α = (α1, . . . , αD)

∀l = 1, . . . , p, asjl = αk ajl for k such that tj ∈ Ik .

the sparsity constraint is put on α and not directly on asj

α are made identical for all dimensions of the projection j = 1, . . . , d

Nathalie Villa-Vialaneix | SISIR 19/34

SIR in large dimensions: our sparse version

Li and Nachtsheim (2008) (LASSO):

minas

j

n∑i=1

‖Paj (X |yi) − (asj )T xi‖

2 + µ1,j‖asj ‖L1 ,

in which Paj is the projection of E(X |Y = yi) = Xh (for h such that yi inslide h) onto the space spanned by the aj .Our adaptation:

α = arg minα∈RD

d∑j=1

n∑i=1

‖Paj (X |yi) − (Λ(α) aj)> xi‖

2 + µ1‖α‖L1

with ∀l = 1, . . . , p, asjl = αk ajl for k such that tj ∈ Ik and

Λ(α) = Diag (α1I|I1 |, . . . , αDI|ID |) ∈ Mp×p .

Nathalie Villa-Vialaneix | SISIR 20/34

Summary : SISIR: a two step approach

First step: Solve the projection problem (using SIR and L2-regularization ofΣ) that provides the estimates (aj)j∈{1,...,d} of the vectors spanning the EDRspace.

Second step: Sparsity under the D intervals using α = (α1, . . . , αD)solving a LASSO problem : handles functional setting by penalizing entireintervals and not just isolated points.

Nathalie Villa-Vialaneix | SISIR 21/34

SISIR: Characteristics

uses the approach based on the correlation formulation (because thedimensionality of the optimization problem is smaller);

uses a shrinkage approach and optimizes shrinkage coefficients in asingle optimization problem;

handles functional setting by penalizing entire intervals and not justisolated points.

Nathalie Villa-Vialaneix | SISIR 22/34

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }

2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I along the regularization path, select three values for µ1:

P% of thecoefficients are zero, P% of the coefficients are non zero, best GCV.define: D− (“strong zeros”) and D+ (“strong non zeros”)

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I along the regularization path, select three values for µ1: P% of thecoefficients are zero, P% of the coefficients are non zero, best GCV.define: D− (“strong zeros”) and D+ (“strong non zeros”)

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)I merge consecutive “strong zeros” (or “strong non zeros”) or “strong

zeros” (resp. “strong non zeros”) separated by a few numbers ofintervals which are of undetermined type.

Until no more iterations can be performed.

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)I merge consecutive “strong zeros” (or “strong non zeros”) or “strong

zeros” (resp. “strong non zeros”) separated by a few numbers ofintervals which are of undetermined type.

Until no more iterations can be performed.3 Output: Collection of models (first with p intervals, last with 1),M∗D

(optimal for GCV) and corresponding GCVD versus D (number ofintervals).

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

An automatic approach to define intervals1 Initial state: ∀ k = 1, . . . , p, τk = {tk }2 Iterate

I define: D− (“strong zeros”) and D+ (“strong non zeros”)I merge consecutive “strong zeros” (or “strong non zeros”) or “strong

zeros” (resp. “strong non zeros”) separated by a few numbers ofintervals which are of undetermined type.

Until no more iterations can be performed.3 Output: Collection of models (first with p intervals, last with 1),M∗D

(optimal for GCV) and corresponding GCVD versus D (number ofintervals).

Final solution: Minimize GCVD over D.

Nathalie Villa-Vialaneix | SISIR 23/34

Sommaire

1 Background and motivation

2 Presentation of SIR

3 Our proposal

4 Simulations and Real data

Nathalie Villa-Vialaneix | SISIR 24/34

Simulation frameworkData generated with:

X(t) a Gaussian process with mean µ(t) = −5 + 4t − 4t2 and aMatern covariance

aj = sin(

t(2+j)π2 −

(j−1)π3

)IIj (t)

Y =∑d

j=1 log∣∣∣〈X , aj〉

∣∣∣one model: (M1), d = 1, I1 = [0.2, 0.4].

Nathalie Villa-Vialaneix | SISIR 25/34

Definition of the intervalsD = p = 200 (initial state=LASSO) D = 142

D = 41 D = 5

Nathalie Villa-Vialaneix | SISIR 26/34

Second model(M2): d = 3 and I1 = [0, 0.1], I2 = [0.5, 0.65] and I3 = [0.65, 0.78].

Nathalie Villa-Vialaneix | SISIR 27/34

Second model

SISIR standard sparse

Nathalie Villa-Vialaneix | SISIR 28/34

Tecator dataset

relevant intervals

easily interpretable

good MSE

Nathalie Villa-Vialaneix | SISIR 29/34

Sunflower dataset

climatic time series (between 1975 and 2012 in France)daily measure from April to OctoberX=evaportranspiration, Y=yield, n = 111, p = 309

Nathalie Villa-Vialaneix | SISIR 30/34

Sunflower dataset

only two points identified outside the intervalfocus on the second half of the intervalmatches expert knowledge

Nathalie Villa-Vialaneix | SISIR 31/34

Conclusion

SI-SIR:

sparse dimension reduction model adapted to functional framework

fully automated definition of relevant intervals in the range of thepredictors

Package SISIR available on CRAN athttps://cran.r-project.org/package=SISIR.

Perspectives

adaptation to multiple X

application to large-scale real data (agricultural application:X={temperature,rainfall ...}, Y={yield})

replace CV criterion?

Nathalie Villa-Vialaneix | SISIR 32/34

Nathalie Villa-Vialaneix | SISIR 33/34

Aneiros, G. and Vieu, P. (2014).Variable in infinite-dimensional problems.Statistics and Probability Letters, 94:12–20.

Bernard-Michel, C., Gardes, L., and Girard, S. (2008).A note on sliced inverse regression with regularizations.Biometrics, 64(3):982–986.

Coudret, R., Liquet, B., and Saracco, J. (2014).Comparison of sliced inverse regression aproaches for undetermined cases.Journal de la Société Française de Statistique, 155(2):72–96.

Fauvel, M., Deschene, C., Zullo, A., and Ferraty, F. (2015).Fast forward feature selection of hyperspectral images for classification with Gaussian mixture models.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(6):2824–2831.

Ferraty, F. and Hall, P. (2015).An algorithm for nonlinear, nonparametric model choice and prediction.Journal of Computational and Graphical Statistics, 24(3):695–714.

Ferraty, F., Hall, P., and Vieu, P. (2010).Most-predictive design points for functiona data predictors.Biometrika, 97(4):807–824.

Ferré, L. and Yao, A. (2003).Functional sliced inverse regression analysis.Statistics, 37(6):475–488.

Fraiman, R., Gimenez, Y., and Svarc, M. (2016).Feature selection for functional data.Journal of Multivariate Analysis, 146:191–208.

Gregorutti, B., Michel, B., and Saint-Pierre, P. (2015).Grouped variable importance with random forests and application to multiple functional data analysis.

Nathalie Villa-Vialaneix | SISIR 33/34

Computational Statistics and Data Analysis, 90:15–35.

Grollemund, P., Abraham, C., Baragatti, M., and Pudlo, P. (2017).Bayesian functional linear regression with sparse step functions.Preprint.

James, G., Wang, J., and Zhu, J. (2009).Functional linear regression that’s interpretable.Annals of Statistics, 37(5A):2083–2108.

Li, L. and Nachtsheim, C. (2008).Sparse sliced inverse regression.Technometrics, 48(4):503–510.

Li, L. and Yin, X. (2008).Sliced inverse regression with regularizations.Biometrics, 64(1):124–131.

Liquet, B. and Saracco, J. (2012).A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approches.Computational Statistics, 27(1):103–125.

Matsui, H. and Konishi, S. (2011).Variable selection for functional regression models via the l1 regularization.Computational Statistics and Data Analysis, 55(12):3304–3310.

Ni, L., Cook, D., and Tsai, C. (2005).A note on shrinkage sliced inverse regression.Biometrika, 92(1):242–247.

Park, A., Aston, J., and Ferraty, F. (2016).Stable and predictive functional domain selection with application to brain images.Preprint arXiv 1606.02186.

Nathalie Villa-Vialaneix | SISIR 34/34

Parameter estimation

H (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);

µ2 and d (ridge estimate A ):I L -fold CV for µ2 (for a d0 large enough)

I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34

Parameter estimationH (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);µ2 and d (ridge estimate A ):

I L -fold CV for µ2 (for a d0 large enough) Note that GCV as described in[Li and Yin, 2008] can not be used since the current version of the L2

penalty involves the use of an estimate of Σ−1.

I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34

Parameter estimation

H (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);

µ2 and d (ridge estimate A ):I L -fold CV for µ2 (for a d0 large enough)I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34

Parameter estimation

H (number of slices): usually, SIR is known to be not very sensitive tothe number of slices (> d + 1). We took H = 10 (i.e., 10/30observations per slice);

µ2 and d (ridge estimate A ):I L -fold CV for µ2 (for a d0 large enough)I using again L -fold CV, ∀ d = 1, . . . , d0, an estimate of

R(d) = d − E[Tr

(ΠdΠd

)],

in which Πd and Πd are the projector onto the first d dimensions of theEDR space and its estimate, is derived similarly as in[Liquet and Saracco, 2012]. The evolution of R(d) versus d is studiedto select a relevant d.

µ1 (LASSO) glmnet is used, in which µ1 is selected by CV along theregularization path.

Nathalie Villa-Vialaneix | SISIR 34/34