Large deviations : theory and applications VERSION ... · If the variables have a ﬁnite variance...

Large deviations :theory and applications

VERSION PRÉLIMINAIRE

Thierry Bodineau

École Polytechnique ParisDépartement de Mathématiques Appliquées

[email protected]

December 11, 2015

Contents

1 Large deviations 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Cramer’s Theorem in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Statement and proof of Cramer’s Theorem . . . . . . . . . . . . . . 61.2.2 Some examples of rate functions . . . . . . . . . . . . . . . . . . . . 101.2.3 First consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Large deviation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.1 Properties of the Legendre transform . . . . . . . . . . . . . . . . . 121.3.2 Large deviation principle . . . . . . . . . . . . . . . . . . . . . . . . 16

1.4 Moderate deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Applications 232.1 Branching process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.2 Large deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Directed Polymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2.2 Large deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Gärtner-Ellis Theorem 313.1 Statement of the Gärtner-Ellis Theorem . . . . . . . . . . . . . . . . . . . . 313.2 Proof of Gärtner-Ellis Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Curie-Weiss model (Episode II) . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Sample path large deviations 394.1 Random walk large deviations . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Mogulskii’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.1.2 A local large deviation upper-bound . . . . . . . . . . . . . . . . . . 404.1.3 Exponential tightness . . . . . . . . . . . . . . . . . . . . . . . . . . 434.1.4 Proof of the lower bound . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Schilder’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3

4 CONTENTS

Chapter 1

Large deviations.

1.1 Introduction

For a sequence {Xi}i > 1 of independent identically distributed (iid) variables with finitemean, the law of large numbers ensures the almost sure convergence of the empiricalmean

Sn :=1n

n

∑i=1

Xi −−−→n→∞E(X1).

If the variables have a finite variance σ2, then the central limit theorem allows to controlthe fluctuations around the mean E(X1)

P(

Sn −E(X1) ∈[

a√n

,b√n

] )−−−→n→∞

1√2πσ2

∫ b

aexp

(− u2

2σ2

).

Thus with high probability the empirical mean Sn concentrates in a window of width1√n close to E(X1). The large deviation theory investigates the probability of observing

atypical events. The main goal of this course is twofold :

• Deriving explicit exponential decays of the form

0 < a < b, P(

Sn −E(X1) ∈ [a, b])' exp

(− nC(a, b)

).

• Describing the mechanisms leading to the atypical events.

We refer the reader to the books by Dembo, Zeitouni [2] and den Hollander [3] for anin-depth study of large deviations.

1.2 Cramer’s Theorem in R

We first present a derivation of Cramer’s Theorem in a simple setting, namely the caseof iid variables {Xi}i > 1 taking values in R. Throughout these notes, we shall use thefollowing notation:

5

6 CHAPTER 1. LARGE DEVIATIONS

• The sum and the empirical mean are denoted by

∀n > 1, Sn =n

∑i=1

Xi and Sn =1n

n

∑i=1

Xi.

• The Log-Laplace transform is given by

∀λ ∈ R, ϕ(λ) = log(E(

exp(λX1)))

with Dϕ ={

λ ∈ R, ϕ(λ) < ∞}

.(1.1)

• The Legendre transform of ϕ is defined as

∀x ∈ R, ϕ∗(x) = supλ∈R

{xλ− ϕ(λ)

}with Dϕ∗ =

{x ∈ R, ϕ∗(x) < ∞

}.

(1.2)

1.2.1 Statement and proof of Cramer’s Theorem

Theorem 1.1. Suppose that Dϕ = R, i.e.

∀λ ∈ R, ϕ(λ) < ∞. (1.3)

Then

∀a > E(X), lim supn→∞

1n

log P(Sn > a

)6 − ϕ∗(a), (1.4)

and

∀]a, b[⊂ R, lim infn→∞

1n

log P(Sn ∈]a, b[

)> − inf

x∈]a,b[

{ϕ∗(x)

}. (1.5)

By symmetry, (1.4) immediately implies that for a 6 E(X) then

lim supn→∞

1n

log P(Sn 6 a

)6 − ϕ∗(a).

ϕ(λ)

λ

λE(X)

α+

α−

ϕ∗(x)

xE(X)

Figure 1.1: On the left, the graph of ϕ such that ϕ(0) = 0 and ϕ′(0) = E(X1). On the right thefunction ϕ∗ > 0 is depicted. It is strictly convex and its minimum is reached at E(X1).

1.2. CRAMER’S THEOREM IN R 7

Proof. The proof is split into 3 steps.

Step 1. Preliminary results.By assumption (1.3), ϕ is well defined on R. Jensen’s inequality implies that

ϕ(λ) = log(E(

exp(λX1)))

> λE(X1). (1.6)

Applying this inequality for λ > 0 implies that E(X1)< ∞ and for λ < 0 that E

(X1)>

−∞. As a consequence of (1.3), we get that∣∣E(X1)∣∣ < ∞. (1.7)

Inequality (1.6) also implies that

ϕ∗(E(X1)

)= sup

λ∈R

{λE(X1)− ϕ(λ)

}= 0. (1.8)

Finally, we also obtain from (1.6) that for any a > E(X)

∀λ 6 0, λa− ϕ(λ) 6 λ(a−E

(X1))

6 0.

As ϕ∗(a) > 0, we deduce that

∀a > E(X), ϕ∗(a) = supλ > 0

{λa− ϕ(λ)

}. (1.9)

Step 2. Proof of the upper bound (1.4).By applying the Chebyshev’s inequality, we get for any λ > 0

P(Sn > a

)6 exp

(− nλa

)E

(exp

(λ

n

∑i=1

Xi

)).

Since the variables are independent, the expectation factorizes

P(Sn > a

)6 exp

(− nλa

)E(exp

(λX1

))n= exp

(−n(λa− ϕ(λ)

)).

The previous inequality is valid for any λ > 0, thus from (1.9), we deduce that

∀n > 1, P(Sn > a

)6 exp (−nϕ∗(a)) . (1.10)

The upper bound (1.4) follows easily from this inequality. We stress the fact that theinequality (1.10) is stronger than (1.4) as it is valid for any n. This is reminiscent of in-equalities in measure concentration theory [1, 4].

Step 3. Proof of the lower bound (1.5).For any x in ]a, b[, we are going to prove that for any δ > 0

lim infn→∞

1n

log P(Sn ∈]x− δ, x + δ[

)> − ϕ∗(x). (1.11)

To do so we rely on the following important properties of ϕ∗ which will be derived laterin Proposition 1.5


• ϕ is a strictly convex function.

• ϕ is differentiable and belongs to C1(R, R). Furthermore

ϕ′(λ) =E(X exp(λX)

)E(

exp(λX)) . (1.12)

Let us introduce the limits (which could be equal to ±∞)

α+ = limλ→∞

ϕ′(λ) ∈ R∪ {∞}, α− = limλ→−∞

ϕ′(λ) ∈ R∪ {−∞}. (1.13)

If x belongs to ]α+, ∞[, then

limλ→∞

xλ− ϕ(λ) = ∞ so that ϕ∗(x) = ∞.

Similarly if x belongs to ]−∞, α−[ then ϕ∗(x) = ∞. Thus to prove inequality (1.11), it isenough to focus on x ∈ [α−, α+].

Case 1. Suppose that x ∈]α−, α+[.As ϕ′ is strictly increasing in ]α−, α+[ there exists a unique θ such that

x = ϕ′(θ) furthermore ϕ∗(x) = θx− ϕ(θ), (1.14)

where the last equality follows from the fact that λ → λx− ϕ(λ) is strictly concave andtherefore its maximum is reached at θ.

Let us denote by µ(dy) the measure associated to X1. We consider the tilted measureµθ(dy) defined as

µθ(dy) =exp

(θy)

E(exp

(θX1

))µ(dy). (1.15)

The mean of the tilted measure is ∫R

µθ(dy) y = x. (1.16)

To see this, we use that x = ϕ′(θ) and the identity∫R

µθ(dy) y =∫

Rµθ(dy)

y exp(θy)

E(exp

(θX1

)) =E(X1 exp

(θX1

))E(exp

(θX1

)) = ϕ′(θ),

where the identification with ϕ′(θ) follows from (1.12). Let us denote by {Xn}n > 1 asequence of i.i.d random variables distributed according to µθ and P its probability. Itsempirical mean converges to x by the law of large numbers, in particular

limn→∞

P

(1n

n

∑i=1

Xi ∈]x− δ, x + δ[

)= 1. (1.17)

Using the previous notation, we now resume the proof of (1.11). First, we write

P(Sn ∈]x− δ, x + δ[

)=∫

Rnµ(dy1) . . . µ(dyn)1An(x),


where

An(x) ={(y1, . . . , yn) ∈ Rn,

1n

n

∑i=1

yi ∈]x− δ, x + δ[}

.

The key argument to derive the lower bound is a change of measure∫Rn

µ(dy1) . . . µ(dyn)1An(x)

= E(exp

(θX1

))n∫An(x)

µ(dy1)exp

(θy1)

E(exp

(θX1

)) . . . µ(dyn)exp

(θyn)

E(exp

(θX1

)) exp

(−θ

n

∑i=1

yi

)

= E(exp

(θX1

))n∫An(x)

µθ(dy1) . . . µθ(dyn) exp

(−θ

n

∑i=1

yi

).

Since ∑ni=1 yi is close to nx for variables in An(x), the previous formula can be simplified∫

Rnµ(dy1) . . . µ(dyn)1An(x) > exp

(nϕ(θ)

)exp

(− n(θx + |δθ|)

) ∫An(x)

µθ(dy1) . . . µθ(dyn)

= exp(− n

(θx− ϕ(θ)

)− n|δθ|

)P

(1n

n

∑i=1

Xi ∈]x− δ, x + δ[

)Using (1.17) and the fact that ϕ∗(x) = θx− ϕ(θ), we deduce that

limn→∞

1n

log P(Sn ∈]x− δ, x + δ[

)> − ϕ∗(x)− |δθ|. (1.18)

As the events {Sn ∈]x − δ, x + δ[} are decreasing, the inequality (1.11) is recovered byletting δ tend to 0.

Case 2. Suppose that x = α+ (the case x = α− is similar).Two sub-cases have to be distinguished.

• If µ(α+) = 0, then ϕ∗(α+) = ∞ and therefore the inequality (1.11) holds. To checkthis, we write

ϕ∗(α+) = supλ > 0

{λα+− log

(E(

exp(λX))) }

= supλ > 0

{− log

(E(

exp(λ(X− α+)

))) }.

As almost surely X− α+ < 0, the dominated convergence Theorem implies

limλ→∞

E(

exp(λ(X− α+)

))= 0

so that ϕ∗(α+) = ∞.

• Suppose that µ(α+) > 0. We write

ϕ∗(α+) = supλ > 0

{− log

(E(µ(α+) + 1{X−α+<0} exp

(λ(X− α+)

))) }.

Taking the limit λ→ ∞ as before implies that ϕ∗(α+) = − log µ(α+).

The lower bound (1.11) follows by noticing that

P(Sn ∈]α+ − δ, α+ + δ[

)> P

(X1 = α+

)n= exp(−nϕ∗(α+)). (1.19)


1.2.2 Some examples of rate functions

We compute the rate function ϕ∗ for different laws:

Gaussian law. Suppose that E(X1) = x and Var(X1) = σ2, then

∀λ ∈ R, ϕ(λ) = log E(

exp λX1

)=

12

σ2λ2.

Thus the rate function is

∀x ∈ R, ϕ∗(x) = supx∈R

{xλ− 1

2σ2λ2

}=

x2

2σ2

and the supremum is reached at λ = xσ2 .

Bernoulli law. Suppose that P(X1 = 1) = 1−P(X1 = −1) = p, then

∀λ ∈ R, ϕ(λ) = log(

p exp(λ) + (1− p) exp(−λ))

.

In order to compute

ϕ∗(x) = supλ∈R

{xλ− ϕ(λ)

}for x in [−1, 1]

one has to solve

x = ϕ′(λx) =p exp(λx)− (1− p) exp(−λx)

p exp(λx) + (1− p) exp(−λx)⇒ 2λx = log

(1 + x

p

)− log

(1− x1− p

).

This leads to

ϕ(λx) = −12

(log(

1 + xp

)+ log

(1− x1− p

))+ log 2

so that the rate function is given by

ϕ∗(x) = xλx − ϕ(λx) =1 + x

2log(

1 + xp

)+

1− x2

log(

1− x1− p

)− log 2.

Exponential law. Suppose that X1 follows an exponential law on R+ with parameter θ > 0,i.e. that its density is given by θ exp(−θx)dx. In this case

ϕ(λ) = log∫

R+θ exp

((λ− θ)x

)dx =

{log θ

θ−λ , for λ < θ

∞, for λ > θ.

Note that Dϕ =]−∞, θ[, nevertheless the proof of Cramer’s Theorem can be adapted tothis case and the rate function is given by

ϕ∗(x) = supλ<θ

{xλ− log

θ

θ − λ

}=

{θx− 1− log

(θx), for x > 0

∞, for x 6 0

where the supremum is reached at λ = θ − 1/x for x > 0 and at λ = −∞ for x 6 0.As the variables Xi are almost surely positive, the probability of {Sn 6 0} is zero. Thisexplains why the rate function ϕ∗ is infinite in R−.


Polynomial tails. We consider now random variables with a density f which has polyno-mial tails, say of the form f (x) ' 1

xα when x tends to infinity. For α > 2, the variable hasa mean and all moments of order k 6 α− 1, but no exponential moments

λ 6= 0, ϕ(λ) = ∞ ⇒ ϕ∗(x) = supλ∈R

{xλ− ϕ(λ)

}= 0,

where the supremum is always reached at λ = 0. This comes from the fact that theprobability of a large deviation decays polynomially and therefore has no cost at theexponential level.

1.2.3 First consequences

As a first consequence of Cramer’s Theorem, the law of large numbers can be recovered.

Corollary 1.2. The empirical mean Sn converges almost surely to E(X1).

Proof. Applying inequality (1.10), one has for any n > 1

P(

Sn 6∈[E(X1)− ε, E

(X1)+ ε])

6 2 exp(−n inf

{ϕ∗(E(X1)− ε), ϕ∗(E(X1)+ ε)})

.

(1.20)

Recall that ϕ∗(E(X1))

= 0 and that ϕ∗ is strictly convex (see Proposition 1.5). By conti-nuity of ϕ∗, there is a sequence εn such that

limn→∞

εn = 0 and ∀n > 1, inf{

ϕ∗(E(X1)− εn

), ϕ∗(E(X1)+ εn

)}>

1√n

.

From the inequality (1.20)

∞

∑n=1

P(

Sn 6∈[E(X1)− εn, E

(X1)+ εn

])6 2

∞

∑n=1

exp(−√

n) < ∞.

By applying Borel-Cantelli Theorem, we deduce that Sn converges almost surely to E(X1).

The behavior of atypical events can also be understood by using large deviation esti-mates.

Corollary 1.3. Let a > E(X1), then

∀ε > 0, limn→∞

P(

Sn ∈ [a, a + ε]∣∣∣ Sn > a

)= 1.

Thus conditionally to observing a deviation of the empirical mean Sn beyond a, thisdeviation will never be much larger than a with high probability. The rate function cap-tures the optimal way for realizing an atypical event.


Proof. Given ε > 0, one has

log P(

Sn > a + ε∣∣∣ Sn > a

)= log P

(Sn > a + ε

)− log P

(Sn > a

).

Since a > E(X1) and that ϕ∗ is strictly convex (see Proposition 1.5), one has ϕ∗(a) =infx > a{ϕ∗(x)}. Thus applying the upper and lower bounds of the large deviation prin-ciple, we get for any ε > 0

limn→∞

1n

log P(

Sn > a + ε∣∣∣ Sn > a

)6 − ϕ∗(a + ε) + ϕ∗(a) < 0,

where we used the strict convexity of ϕ∗ in the last inequality. This completes the corol-lary.

1.3 Large deviation principle

1.3.1 Properties of the Legendre transform

In this section, we state some general properties of the Log-Laplace transform ϕ intro-duced in (1.1)

∀λ ∈ R, ϕ(λ) = log(E(

exp(λX)))

.

and of its Legendre transform (1.2). Let Dϕ ={

λ ∈ R, ϕ(λ) < ∞}

be the domain ofdefinition of ϕ. It is an interval of R containing 0 (and which could be reduced to 0).

Proposition 1.4. The Log-Laplace transform ϕ satisfies :

1. ϕ is C∞ differentiable in the interior of Dϕ and

ϕ′(λ) =E(X exp(λX)

)E(

exp(λX)) , ϕ”(λ) =

E(X2 exp(λX)

)E(

exp(λX)) − [E

(X exp(λX)

)E(

exp(λX)) ]2

.

(1.21)In particular, if the origin is in the interior of Dϕ, then (see figure 3.2)

ϕ′(0) = E(X) and ϕ′′(0) = E(X2)−E(X)2 = Var(X). (1.22)

2. ϕ is a strictly convex function.

Proof.

Assertion 1. DifferentiabilityWe first check that ϕ is continuous at the point λ in the interior of Dϕ. Suppose that

[λ− δ, λ + δ] ⊂ Dϕ, then

∀u ∈ [λ− δ, λ + δ], exp(uX)6 exp

((λ− δ)X

)+ exp

((λ + δ)X

).

Since the bound is uniform in u, the dominated convergence Theorem ensures that

limu→λ

E(exp

(uX))

= E(exp

(uX))

1.3. LARGE DEVIATION PRINCIPLE 13

from which we deduce that ϕ is continuous at λ.To prove that ϕ is once differentiable at λ, let us introduce

hε(λ, x) :=exp

((λ + ε)x

)− exp

(λx)

ε

which satisfies limε→0 hε(λ, x) = x exp(λx). We remark that for any ε ∈ [−δ, δ], thefunction hε is uniformly bounded∣∣∣hε(λ, x)

∣∣∣ 6 exp(λx)exp

(ε|x|)− 1

ε6 exp

(λx)exp

(δ|x|

)− 1

δ.

Rephrased in terms of random variables∣∣∣hε(λ, X)∣∣∣ 6 exp

((λ + δ)X

)+ exp

((λ− δ)X

)δ

.

As λ is in the interior of Dϕ, the variable in the right hand side is bounded for small δ.Thus the dominated convergence Theorem implies that λ → E

(exp(λX)

)is differen-

tiable

ϕ′(λ) =∂λE

(exp(λX)

)E(

exp(λX)) =

E(X exp(λX)

)E(

exp(λX)) .

Iterating the same argument, one can show that ϕ is C∞ differentiable in the interior ofDϕ. The second identity in (1.21) can be obtained by differentiating in the expectation.

Assertion 2. ConvexityLet µ(dx) be the law associated to the variable X and consider X a new variable with

law µ(dx) exp(λx)

E(

exp(λX)) (which has been already introduced in the derivation of the lower

bound (1.15)). The corresponding expectation is denoted by E. From (1.21), we canrewrite

ϕ”(λ) = E(X2)− E

(X)2

= Var(X)> 0. (1.23)

Since X is not a constant the variance of X is positive and we deduce that ϕ is strictlyconvex.

We turn now to the properties of the Legendre transform of ϕ

∀x ∈ R, ϕ∗(x) = supλ∈R

{xλ− ϕ(λ)

}.

Proposition 1.5. Let Dϕ∗ ={

x ∈ R, ϕ∗(x) < ∞}

be the domain of definition of ϕ∗. Thenϕ∗ satisfies

1. infx∈R ϕ∗(x) = ϕ∗(E(X)

)= 0.

2. If 0 belongs to the interior of Dϕ then for any L > 0 the level set {x ∈ R; ϕ∗(x) 6 L} iscompact. As a consequence, ϕ∗ is lower semicontinuous, i.e.

for any sequence {yn}n > 1 converging to y in X then lim infn

ϕ∗(yn) > ϕ∗(y).(1.24)


3. ϕ∗ is C∞ on the interior of Dϕ∗ and if Var(X) := E(X2)−E(X)2 6= 0

∂x ϕ∗(E(X)

)= 0, ∂2

x ϕ∗(E(X)

)=

1Var(X)

. (1.25)

4. ϕ∗ is a strictly convex function.

Proof.Assertion 1. ϕ∗

(E(X)

)= 0.

This result was obtained in (1.8).

Assertion 2. Compactness and lower semicontinuity.To prove that {x ∈ R; ϕ∗(x) 6 L} is a closed set, it is enough to check the lower

semicontinuity (1.24). Let {yn}n > 1 be a sequence converging to y. For any λ

lim infn

ϕ∗(yn) > lim infn

λyn − ϕ(λ) = λy− ϕ(λ).

Since the inequality is valid for any λ, we deduce that lim infn ϕ∗(yn) > ϕ∗(y). As aconsequence, if a sequence {yn}n > 1 in {x ∈ R; ϕ∗(x) 6 L} converges to y, then y willalso belong to the level set.

Finally, it remains to prove that {x ∈ R; ϕ∗(x) 6 L} is bounded if 0 belongs to theinterior of DI . There exists δ such that ±δ belongs to DI , thus

∀x ∈ R, ϕ∗(x) > δx− ϕ(δ), and ϕ∗(x) > − δx− ϕ(−δ).

This implies that ϕ∗(x) tends to infinity as x diverges to±∞ so that {x ∈ R; ϕ∗(x) 6 L}is a bounded set.

Assertion 3. Differentiability.From Proposition 1.4, ϕ is a strictly convex function and therefore ϕ′ is increasing. As

in (1.13), we denote the range of ϕ′ by

α+ = limλ→∞

ϕ′(λ) ∈ R∪ {∞}, α− = limλ→−∞

ϕ′(λ) ∈ R∪ {−∞}.

For any x ∈ [α−, α+]c then ϕ∗(x) = ∞ so that the interior of Dϕ∗ is ]α−, α+[.For any x in ]α−, α+[, there is a unique value λx such that x = ϕ′

(λx). Furthermore

ϕ′ is C∞, thus the implicit function theorem implies that the function x → λx belongs toC∞(]α−, α+[, R). As λ→ xλ− ϕ(λ) is concave its maximum is reached at λx and

ϕ∗(x) = λx x− ϕ(λx). (1.26)

This implies that ϕ∗ is also C∞.

Taking the derivative with respect to x in (1.26) gives

∂x ϕ∗(x) = λx + ∂xλx

(x− ϕ′

(λx))

= λx.

Note that by differentiating

x = ϕ′(λx)⇒ 1 = ∂xλx ϕ′′

(λx).


The strict convexity of ϕ implies that ϕ′′ is positive

∂2x ϕ∗(x) = ∂xλx =

1ϕ′′(λx) > 0. (1.27)

Proposition 1.4 implies that ϕ′(0) = E(X) and ϕ′′(0) = E(X2) − E(X)2 = Var(X).Thus (1.25) follows by evaluating the derivatives at λE(X) = 0.

Assertion 4. Strict convexity.The strict convexity of ϕ∗ follows from (1.27). Note that the convexity will be recov-

ered in Proposition 1.6 without differentiating.

The Legendre transform can be applied in a more general setting and for complete-ness, we set an important property of duality.

Proposition 1.6. Let f be a function from Rn to R, then its Legendre transform is given by

∀x ∈ Rn, f ∗(x) = supλ∈Rn

{x · λ− f (λ)

}, (1.28)

where x · λ denotes the scalar product.

1. f ∗ is a convex function.

2. The iterated Legendre transform f ∗∗ defined by

∀λ ∈ Rn, f ∗∗(λ) = supx∈Rn

{x · λ− f ∗(x)

}satisfies f > f ∗∗.

3. If f is convex and lower semi-continuous then f ∗∗ = f .

4. In general f ∗∗ = Conv( f ) where Conv( f ) stands for the convex envelope of f , i.e. thelargest convex function which is smaller than f .

f

f ∗∗ = Conv (f )

Figure 1.2: As f is not convex, f ∗∗ = Conv( f ) has a flat part depicted by the dashed line.

Proof.Assertion 1. Convexity


The Legendre transform f ∗ is convex as it is the supremum of the convex functionsx → gλ(x) = x · λ− f (λ). To see this, we write the convexity property of gλ for a givenλ : for x, y ∈ Rd and α ∈ [0, 1]

gλ

(αx + (1− α)y

)6 αgλ

(x)+ (1− α)gλ

(y)6 α sup

γ∈Rngγ

(x)+ (1− α) sup

γ∈Rngγ

(y)

6 α f ∗(x)+ (1− α) f ∗

(y).

Since the upper bound is valid for any λ, we conclude that f ∗ is convex

f ∗(αx + (1− α)y

)6 α f ∗

(x)+ (1− α) f ∗

(y).

Note that the convexity requires no differentiabilty assumption on f .

Assertion 2. f > f ∗∗.From the definition (1.28)

∀x, λ ∈ Rn, f ∗(x) > x · λ− f (λ) ⇒ f (λ) > x · λ− f ∗(x).

As the previous inequality is valid for all x, we deduce the inequality

f (λ) > supx∈Rn

{x · λ− f ∗(x)

}=: f ∗∗(λ).

Assertion 3. For f convex then f = f ∗∗.

Assertion 4. f ∗∗ = Conv( f ).Let g = Conv( f ). By construction f > g and applying the Legendre transformation

implies that f ∗ 6 g∗. Applying it a second time shows that f ∗∗ > g∗∗. Since g is convexwe have shown that f ∗∗ > g = g∗∗. By construction, g is the largest convex functionsatisfying f > g. We deduce that f ∗∗ = g.

1.3.2 Large deviation principle

Cramer’s Theorem stated in section 1.2 can be rephrased in a more general framework.

Definition 1.7 (Large deviation principle). A sequence of random variables {Wn}n > 1 takingvalues in a (polish1) space X satisfies a large deviation principle with rate function I if

• for any closed subset C of X

lim supn→∞

1n

log P(Wn ∈ C

)6 − inf

x∈C

{I(x)

}. (1.29)

• for any open subset O of R

lim infn→∞

1n

log P(Wn ∈ O

)> − inf

x∈O

{I(x)

}. (1.30)

1A polish space is a separable completely metrizable topological space


Furthermore, the large deviation principle is said to be good if the level sets of I are compact, i.e.

for any L > 0, the set {x ∈ X ; I(x) 6 L} is compact in X . (1.31)

The main reason for considering the additional compactness property (1.31) is relatedto the notion of exponential tightness.

Definition 1.8 (Exponential tightness). A sequence of random variables {Wn}n > 1 taking val-ues in a (polish) space X is exponentially tight if for any L > 0, there exists a compact set KLof X such that

limn→∞

1n

log P(Wn ∈ X \ KL

)6 − L.

An important property of a function I with compact level sets is that I is lower semi-continuous, i.e.

for any sequence {yn}n > 1 converging to y in X then lim infnI(yn) > I(y). (1.32)

Theorem 1.9. If a sequence of random variables {Wn}n > 1 satisfies a good large deviation prin-ciple then its rate function is unique.

Proof. Suppose that there are I and J two rate functions and that there is x such thatI(x) < J (x). Let B1/k = {y; ‖y− x‖ < 1/k} be a ball of radius 1/k centered aroundx. Then, the large deviation principle implies that for any k > 0

−I(x) 6 lim infn→∞

1n

log P(Wn ∈ B1/k

)6 lim sup

n→∞

1n

log P(Wn ∈ B1/k

)6 − inf

y∈B1/k

{J (y)

}.

A good large deviation principle has a lower semi-continous rate function. We deducethat

lim infk→∞

infy∈B1/k

{J (y)

}> J (x).

This leads to the contradiction I(x) > J (x).

With the same notation as in Theorem 1.1, one can state a more general version ofCramer’s Theorem

Theorem 1.10. Suppose that Dϕ = R, then {Sn}n > 1 satisfies a good large deviation principlewith rate function ϕ∗.

Proof. We will rely heavily on the estimate obtained in Theorem 1.1.

Step 1. Upper boundLet C be a closed subset of R, we are going to show that

limn→∞

1n

log P(Sn ∈ C

)6 − inf

x∈C

{ϕ∗(x)

}. (1.33)


ϕ∗(x)

xE(X) x+x−

Figure 1.3: The set C ⊂]−∞, x−] ∪ [x+, ∞[ is depicted in gray.

Suppose that infx∈C{

ϕ∗(x)}

> 0, otherwise there is nothing to prove. In this caseE(X1) does not belong to C. Let ]x−, x+[ be the largest interval in Cc containing E(X1)(see figure 1.3), then from the convexity of ϕ∗, we deduce that

infx∈C

{ϕ∗(x)

}= inf

{ϕ∗(x+); ϕ∗(x−)

}.

Since C ⊂]−∞, x−] ∪ [x+, ∞[, the proof follows from a direct application of Theorem 1.1

P(Sn ∈ C

)6 P

(Sn ∈ [x+, ∞[

)+ P

(Sn ∈]−∞, x−]

)6 2 exp

(− n inf

{ϕ∗(x+); ϕ∗(x−)

}).

Step 2. Lower bound and compactness of the level sets.Let O be an open subset of R, it enough to show that for any x ∈ O

limn→∞

1n

log P(Sn ∈ O

)> − ϕ∗(x). (1.34)

This follows from (1.5) by considering a neighborhood ]x− δ, x + δ[ included in O.

From Proposition 1.5 (2), we know that the level sets are compact.

1.4 Moderate deviations

For a sequence {Xi}i > 1 of iid variables with mean E(X1) = 0 and variance Var(X1) = σ2,the central limit theorem allows to control the fluctuations of order 1/

√n around the

mean. Informally, one can say that the probability that Sn is close to x√n converges to a

Gaussian distribution for n large

P(

Sn 'x√n

)' 1√

2πσ2exp

(− x2

2σ2

).

On the other hand, the large deviation principle (Theorem 1.10) controls deviations oforder 1 away from the mean. For x 6= 0, the large n asymptotic can be written informally

P(

Sn ' x)' exp (−nϕ∗(x)) .

1.4. MODERATE DEVIATIONS 19

The main goal of this section is to reconcile both asymptotics by investigating intermedi-ate range of deviations : for some β ∈ (0, 1/2), we would like to estimate the probabilitythat Sn is close to x

nβ .In order to gain some intuition, we suppose that the large deviation asymptotic re-

mains valid for deviations of order n−β

P(

Sn 'x

nβ

)' exp

(−nϕ∗

( xnβ

)).

A Taylor expansion at the second order of ϕ∗ gives

ϕ∗( x

nβ

)=

∂2ϕ∗ (0)2

x2

n2β+ O

(1

n3β

)=

12σ2

x2

n2β+ O

(1

n3β

)(1.35)

where we used, from Proposition 1.5, that ϕ∗(0) = ∂ϕ∗(0) = 0 and ∂2ϕ∗(0) = 1σ2 . Thus

for n large, we expect that for β ∈ (0, 1/2)

P(

Sn 'x

nβ

)' exp

(−n1−2β 1

2σ2 x2 + O(

n1−3β))' exp

(−n1−2β x2

2σ2

).

For deviations in a window n−β around the mean, only the curvature of ϕ∗ at the meanshould be relevant and for β = 1/2, the Gaussian limit of the central limit theorem isrecovered.

The moderate deviation theorem justifies the previous heuristic.

Theorem 1.11 (Moderate deviations). Let {Xi}i > 1 be a sequence of iid variables with meanE(X1) = 0, variance Var(X1) = σ2 and such that ϕ(λ) is finite for λ in a neighborhood of 0.For any β ∈ (0, 1/2)

∀a > 0, limn→∞

1n1−2β

log P(

Sn >a

nβ

)= − a2

2σ2 .

Remark 1.12. In general, if a sequence {Wn}n > 1 obeys a large deviation principle, the moderatedeviations are not necessarily given by the second order expansion of the rate function. To see this,it is enough to consider the example Wn = Sn1{Sn>

1nβ }

with Sn the empirical measure of some iid

variables. Wn obeys a large deviation principle with the same rate function as Sn, however theirmoderate deviations will be different.

Proof. The proof follows closely the derivation of Cramer’s Theorem

Step 1. Upper bound.By Chebyshev’s inequality, one has for any λ > 0

P(

Sn >a

nβ

)6 exp

(− λan1−β + nϕ(λ)

).

From (1.22), we know that ϕ′(0) = E(X1) = 0 and ϕ′′(0) = σ2. Thus a Taylor expansionof ϕ for λ small gives

P(

Sn >a

nβ

)6 exp

(− λan1−β + n

σ2

2λ2 + nO(λ3)

).


Note that

infλ∈R

{−λan1−β + n

σ2

2λ2}

= − a2

2σ2 n1−2β, (1.36)

and that the maximum is reached at γn = aσ2 nβ which vanishes to 0 when n tends to

infinity. Thus, choosing λ = γn in the previous inequality completes the upper bound

P(

Sn >a

nβ

)6 exp

(− a2

2σ2 n1−2β + O(n1−3β))

.

Step 2. Lower bound.We will follow the proof of Cramer’s Theorem and consider the tilted measure

µγn(dy) =exp

(γny

)E(exp

(γnX1

))µ(dy) (1.37)

with parameter γn = a+εσ2 nβ for some ε > 0. The parameter γn plays a role analogous to θ

in (1.14), however γn has to be tuned with n as suggested by the derivation of the upperbound.

Using a Taylor expansion, one can check that, for n large, the mean of the tilted mea-sure is given by

∫R

µγn(dy) y = ϕ′(γn) = ϕ′(0) + ϕ′′(0)γn + O(γ2n) =

a + ε

nβ+ O

(1

n2β

). (1.38)

Thus, if {Xn}n > 1 is a sequence of i.i.d random variables distributed according to µγn , weexpect that

limn→∞

P

(1n

n

∑i=1

Xi ∈[

anβ

,a + 2ε

nβ

])= 1 (1.39)

where P stands for the probability associated to µγn . The derivation of (1.39) is postponedto the end of the proof.

Using (1.39), we will complete the derivation of the lower bound. First, we write

P(

Sn >a

nβ

)>∫

Rnµ(dy1) . . . µ(dyn)1An

where

An ={(y1, . . . , yn) ∈ Rn,

1n

n

∑i=1

yi ∈[

anβ

,a + 2ε

nβ

] }.

As for the lower bound of Cramer’s Theorem, the change of measure leads to

∫An

µ(dy1) . . . µ(dyn) = E(exp

(γnX1

))n∫An

µγn(dy1) . . . µγn(dyn) exp

(−γn

n

∑i=1

yi

).

1.4. MODERATE DEVIATIONS 21

Since ∑ni=1 yi is smaller than n1−β(a + 2ε) for variables in An, we get∫

An

µ(dy1) . . . µ(dyn) > exp(nϕ(γn)− γn(a + 2ε)n1−β

) ∫An

µγn(dy1) . . . µγn(dyn)

= exp(− a2 + 4aε + 2ε2

2σ2 n1−2β)

P

(1n

n

∑i=1

Xi ∈[

anβ

,a + 2ε

nβ

])where the last inequality follows by a Taylor expansion as in (1.36). Using (1.39), we get

lim infn→∞

1n1−2β

log P(Sn >

anβ

)> − a2 + 4aε + 2ε2

2σ2 . (1.40)

This holds for any ε > 0, thus letting ε tend to 0, the derivation of the lower bound iscompleted.

It remains now to prove (1.39). The law of large numbers cannot be used directly asthe mean of X1 varies with n. In order to go around this difficulty, we are going to provea large deviation upper bound from which (1.39) will follow. We define

ϕ(λ) = log E(exp(λX1)

)= ϕ(λ + γn)− ϕ(γn),

where the last equality follows from the identity

log∫

Rµγn(dy) exp(λy) = log E

(exp

((λ + γn)X1)

)− log E

(exp

(γnX1

)).

Using again a Taylor expansion and the fact that γn = a+εσ2 nβ

ϕ( α

nβ

)=

ϕ′′(0)2

((α + (a + ε)/σ2

nβ

)2

− (a + ε)2

n2βσ2

)+ O

(1

n3β

)=

1n2β

(σ2

2α2 + α(a + ε)

)+ O

(1

n3β

).

This implies that for α > 0

limn→∞

P

(1n

n

∑i=1

Xi <a

nβ

)6 exp

(n1−βa

α

nβ+ nϕ

(− α

nβ

))6 exp

(n1−2β

[−aεα +

σ2

2α2]+ O

(n1−3β

)).

For any ε > 0, one can choose α small enough such that the probability decays expo-nentially fast as n tends to ∞. In the same way, one can show that the probability of{ 1

n ∑ni=1 Xi >

anβ } tends to 0. This concludes the proof of (1.39).

The moderate deviation Theorem shows that the rate function ϕ∗ of Cramer’s The-orem quantifies also the deviations close to the mean E(X). In fact, in the case of iidvariables, ϕ∗ encodes much more information than a simple characterization of the atyp-ical events. Indeed, if the Laplace transform λ → E

(exp(λX)

)is finite in R, then it

characterizes fully the law of X. This means that ϕ characterizes also the law of X. As ϕis convex, Proposition 1.6 implies that ϕ = ϕ∗∗ so that ϕ∗ encodes as much informationas ϕ.

Chapter 2

Applications.

2.1 Branching process

A motivation for considering large deviations for branching processes comes from thefollowing problem of biology1. We consider cells infected by parasites. Inside each cellthe parasites reproduce according to a branching process (with the same law for all thecells). At each unit time, a cell splits into two cells and we assume that the parasites areshared with the same number into each of the new cells.

At time n, there are 2n cells, but the number of parasites inside each cell varies. Onewould like to estimate Nn the number of cells which are not infected at time n. As thecells are indistinguishable, we get

E(Nn) = 2n P(a cell is not infected).

The number of parasites inside a single cell can be approximated by a Galton-Watsonprocess. Thus we have reduced the problem to estimating the probability of observingan atypical number of parasites in a given cell, i.e. a large deviation of a Galton Watsonprocess. Note that the probability that a cell is not infected will vanish exponentially fast,but it is multiplied by the factor 2n. Thus the amount of non infected cells may not benegligible.

2.1.1 The model

Consider a Galton-Watson process with offsprings given by a collection of iid variables{ζt

i}i > 1,t > 1 distributed according to the reproduction law

∀k > 0, P(ζti = k) = pk.

We assume thatp0 = 0 and 0 < p1 < 1.

The population at time t is denoted by Zt and

t ∈N, Zt+1 = ζt+11 + · · ·+ ζt+1

Zt.

1V. Bansaye: Proliferating parasites in dividing cells : Kimmel’s branching model revisited, Annals of AppliedProbability, Volume 18, Number 3, 967-996 (2008).

23

24 CHAPTER 2. APPLICATIONS

The averaged number of descendants is denoted by µ = E(ζti ) > 1 so that the mean

population size is given by

E(Zt+1

)=

∞

∑n=0

E(Zt+1

∣∣ Zt = n)P(Zt = n

)= µ

∞

∑n=0

nP(Zt = n

)= µE

(Zt)= µt+1. (2.1)

To quantify the large deviations of the population size, we assume that

∀λ ∈ R, ϕ(λ) = log(

E(

exp(λ ζ11)))

= log

(b

∑k=1

exp(λ k) pk

)< ∞. (2.2)

2.1.2 Large deviations

We are going to prove that the probability of observing a large deviation behaves differ-ently if the population Zt is smaller or larger than the mean µt.

Theorem 2.1. Let 1 6 a 6 µ, then

limt→∞

1t

log P(Zt 6 at) = (1− log(a)

log(µ)

)log(p1). (2.3)

Let a > µ, then

lim supt→∞

1t

log P(Zt > at) = −∞. (2.4)

Proof. We start by investigating the deviations (2.3) below the mean. Given a 6 µ, weintroduce α < 1 such that

a = µα ⇔ α =log(a)log(µ)

.

Step 1. We will first prove the following lower bound.

lim inft→∞

1t

log P(Zt 6 at) > (

1− log(a)log(µ)

)log(p1). (2.5)

This lower bound can be interpreted as follows: to reduce the population Zt, it is suf-ficient to prevent the growth up to a time of order (1− α)t and then to let the populationgrow without any constraints (see figure 2.1). For any ε > 0, we write

P(Zt 6 at) > P

({Zb(1+ε)(1−α)tc = 1} ∩ {Zt 6 at}

)= P

(Zt(1−(1+ε)(1−α)) 6 at

)p(1+ε)(1−α)t

1 > P(

Z(α−ε(1−α))t 6 at)

p(1+ε)(1−α)t1 ,

where b·c stands for the integer part. Conditionally to Zb(1+ε)(1−α)tc = 1, the populationat time t is the same as the population of a tree at time

(α− ε(1− α)

)t. Thus (2.5) will be

complete, once we showed that

∀ε > 0, limt→∞

P(

Z(α−ε(1−α))t 6 at)= 1. (2.6)

2.1. BRANCHING PROCESS 25

Indeed, this would imply that for any ε > 0

lim inft→∞

1t

log P(Zt 6 at) > (1 + ε)(1− α)t log(p1).

Letting ε tend to 0 and using that α = log(a)log(µ) leads to (2.5).

We turn now to the proof of (2.6). First notice that for any δ > 0

limn→∞

P(

Zn 6 (µ + δ)n)= 1. (2.7)

This follows from the Tchebyshev’s inequality

P(

Zn > (µ + δ)n)6

1(µ + δ)n E

(Zn

)=

µn

(µ + δ)n .

Since a = µα, one can choose δ > 0 such that (µ + δ)(

α−ε(1−α))6 µα = a. The limit (2.6)

is then a consequence of (2.7).

Step 2. We turn now to the derivation of the upper bound

lim supt→∞

1t

log P(Zt 6 at) 6 (

1− log(a)log(µ)

)log(p1). (2.8)

From Tchebyshev’s inequality, one has with the notation (2.2)

P(Zt 6 at) 6 exp(1)E

(exp

(− a−tZt

))= exp(1)E

(exp

(ϕ(−a−t) Zt−1

)).

Iterating the procedure, we get

P(Zt 6 at) 6 exp(1) exp

(Φt(−a−t) ) with Φt(x) = ϕ ◦ ϕ ◦ · · · ◦ ϕ︸︷︷︸

t times

(x). (2.9)

The asymptotic behavior of n→ Φn(−a−t) can be understood from the graph of ϕ plot-

ted in figure 2.1. As a−t is very close to 0, Φn(−a−t) remains also close to 0 for n not

too large. As soon as Φn(−a−t) becomes less than −1, then the series starts to diverge

towards −∞ in a way which can be controlled by the asymptotic behavior of ϕ at −∞.

We are going to prove that for any ε > 0 small enough, one can find c > 0 such that

lim supt→∞

Φb(1+ε)αtc(−a−t) < −c. (2.10)

Recall that ϕ′(0) = µ so that in a neighborhood of the origin, ϕ(x) is well approxi-mated by x → µx (see figure 2.1). For a given ε > 0, one can choose m = µ1/(1+ε) < µwith m > 1. Thus there exists cε ∈ (0, 1) such that

∀x ∈ [−cε, 0], ϕ(x) 6 mx.

Thus for any n such that Φn(−a−t) belongs to [−cε, 0], then

Φn(−a−t) 6 − mn

at .


logZtt log(µ)

t log(a)

t(1− α) t

-4 -3 -2 -1 1

-4

-2

2

4

Figure 2.1: The large deviation mechanism for lowering the population size is depicted on theright. The graph of ϕ is represented on the left picture. The tangent at the origin x → µx isrepresented with dots. When x tends to infinity, ϕ is asymptotically parallel to the dashed straightline x → x.

Suppose thatlim sup

t→∞Φb(1+ε)αtc

(−a−t) > −cε

then one would get a contradiction as

Φb(1+ε)αtc(−a−t) 6 − m(1+ε)αt

at = −(

m(1+ε)α

a

)t

= −1

where we used that a = µα and m = µ1/(1+ε). Thus (2.10) holds with c = cε.

From the asymptotic behavior of ϕ

limλ→−∞

ϕ(λ)− λ = log(p1),

one can deduce that for any c > 0

limt→∞

1t

Φt (−c) = log(p1).

From (2.10) and the fact that Φn is increasing, one gets for t large enough

Φt(−a−t) = Φt−b(1+ε)tαc

(Φb(1+ε)αtc

(−a−t)) 6 Φt−b(1+ε)tαc (−c)

so that

lim supt→∞

1t

Φt(−a−t) 6 (1− α + εα) log(p1).

Using (2.9), this implies

lim supt→∞

1t

log P(Zt 6 at) 6 (1− α + εα) log(p1). (2.11)

Since the inequality is valid for any ε > 0, this concludes the proof of (2.8).

2.2. DIRECTED POLYMER 27

It remains to show the bound (2.4) for a > µ. Applying the Tchebyshev’s estimate asin (2.9), we get for λ > 0

P(Zt > at) 6 exp(−λat)E

(exp

(λZt

))6 exp(−λat) exp

(Φt (λ)

)Recall that ϕ(x) behaves as x → µx close to 0. Choosing λ = 1

ut with u ∈]µ, a[, we get

Φt

(1ut

)' µt

ut ' 0 ⇒ limt→∞

Φt

(1ut

)= 0.

Thus, we recover (2.4)

lim supt→∞

1t

log P(Zt > at) 6 − lim sup

t→∞

1t(a/u)t = −∞.

2.2 Directed Polymer

2.2.1 The model

We consider a sequence of i.i.d random variables {ωi,j}i > 1,j > 1 with smooth density dis-tribution supported by [0, M]. The mean is denoted by m = E(ω1,1) ∈]0, M[.

A directed path γ = {x0, . . . , x2n} in N2 from (0, 0) to (n, n) is a collection verticessuch that

• x0 = (0, 0) and x2n = (n, n).

• each step xi, xi+1 is oriented to the right or up.

The energy of the path γ is the sum of the random variables on the path

E(γ) =2n

∑i=1

ωxi .

We are interested in the minimal energy overall the directed paths from (0, 0) to (n, n)

En = minγ

{E(γ)

}. (2.12)

More generally, for any x in N2, we denote by En(x) the minimal energy overall thedirected paths from x to x + (n, n).

One can show that there is a value µ > 0 such that

limn→∞

1n

E(En)= µ < 2m = 2E(ω1,1). (2.13)

We are interested in the large deviations of En from its mean.


2.2.2 Large deviations

The large deviations of En above or below the mean µ obey different scalings.

Theorem 2.2. There exists a function F :]0, µ]→ R+ such that

∀α 6 µ, limn→∞

1n

log P(En 6 αn

)= −F (α). (2.14)

For any α > 2m then

lim supn→∞

1n

log P(En > αn

)= −∞. (2.15)

Proof. The proof is split into two steps.

Step 1. Proof of the limit (2.14).We fix α 6 µ. The minimal energy of the paths from (0, 0) to (n + m, n + m) is less

than the minimal energy of the paths constrained to go through (n, n)

P(En+m 6 α(n + m)

)> P

({En 6 αn} ∩ {Em(n, n) 6 αm}

).

These paths can be decomposed into a path from (0, 0) to (n, n) followed by a path from(n, n) to (n + m, n + m)

P(En+m 6 α(n + m)

)> P

(En 6 αn

)P(Em(n, n) 6 αm

)= P

(En 6 αn

)P(Em 6 αm

)where we used that the variables in the squares [0, n]× [0, n] and [n, n + m]× [n, n + m]are independent and identically distributed.

Introducingfn = log P

(En 6 αn

),

the previous inequality can be rewritten as

∀n, m > 1, fn+m > fn + fm. (2.16)

Thus, we have shown that the sequence { fn}n > 1 is superadditive. The limit (2.14) is aconsequence of the following Proposition.

Proposition 2.3. For any sequence { fn}n > 1 in R satisfying the superadditivity assumption(2.16), the following limit holds

limn→∞

1n

fn = supn > 1

{ 1n

fn

}.

Proof of Proposition 2.3. For any r > 1, we can decompose n = kr + ` with ` < r. From thesuperadditivity (2.16), one has

fn > fkr + f` > k fr + f`.

Letting n tend to infinity, we get

lim infn→∞

1n

fn >1r

fr.

As this holds for any r > 1

lim infn→∞

1n

fn > supr > 1

{1r

fr

}.

This completes the proof.

2.2. DIRECTED POLYMER 29

Step 2. Proof of inequality (2.15).

(0, 0)

(n, n)

γ1

γ4

Figure 2.2: The directed paths {γi}i 6 4 are disjoint in the domain bordered by the dashed seg-ments.

For any integer K, we can build (see figure 2.2) a collection of K directed paths γi =

{x(i)0 , . . . , x(i)2n} such that

• Each path joins (0, 0) to (n, n).

• At distance 2K from (0, 0) and (n, n) the paths do not intersect, i.e. {x(i)2K, . . . , x(i)2n−2K}are disjoints for any i 6 K.

For any path γi, we define the truncated energy

E(γi) =2n−2K

∑j=2K

ωx(i)j

.

By construction, the variables{E(γi)

}i 6 K are independent. Since the environnement ωj

takes values in [0, M], one has E(γi) 6 E(γi) + 4M.Thus

P(En > αn

)6 P

(K⋂

i=1

{E(γi) > αn

})6 P

(K⋂

i=1

{E(γi) > αn− 4M

})

6K

∏i=1

P(E(γi) > αn− 4M

),

where we used the independence of the truncated energies in the last step.The truncated energy E(γi) of a given path γi is the sum of 2n − 4K independent

random variables with mean m. It will therefore obey a large deviation principle with ratefunction I . As α > 2m, one can find n large enough and ε > 0 such that 4M

n 6 ε < α− 2mthen

P

(1nE(γi) > α− 4M

n

)6 exp

(− 2nI(α− ε)

)with I(α− ε) > 0.


This implies that for n large enough

P(En > αn

)6 exp

(− 2nKI(α− ε)

)⇒ lim sup

n→∞

1n

log P(En > αn

)6 − 2KI(α− ε).

Since I(α− ε) > 0, we can then let K tend to infinity to complete the proof of inequality(2.15).

Chapter 3

Gärtner-Ellis Theorem

3.1 Statement of the Gärtner-Ellis Theorem

Cramer’s Theorem can be extended to i.i.d. variables {Xi}i > 1 taking values in Rd withd > 2. In this case, we shall use the notation:

• The empirical mean

∀n > 1, Sn =1n

n

∑i=1

Xi ∈ Rd.

• The Log-Laplace transform is given by

∀λ ∈ Rd, ϕ(λ) = log(E(

exp(〈λ · X1〉)))

with Dϕ ={

λ ∈ Rd, ϕ(λ) < ∞}

,(3.1)

where 〈x · y〉 = ∑di=1 xiyi stands for the scalar product for x, y ∈ Rd.

• The Legendre transform of ϕ is defined as

∀x ∈ Rd, ϕ∗(x) = supλ∈Rd

{〈x ·λ〉− ϕ(λ)

}with Dϕ∗ =

{x ∈ Rd, ϕ∗(x) < ∞

}.

(3.2)

The extension of Cramer’s Theorem in dimension d > 2 is given by

Theorem 3.1 (Cramer). Suppose that Dϕ contains a neighborhood of 0, then Sn obeys a goodlarge deviation principle in Rd with rate function ϕ∗.

Theorem 3.1 will be obtained as a consequence of a more general theorem whichrelaxes the independence assumption. We consider a sequence {Zn}n > 1 of random vari-ables in Rd which will play a role analogous to the empirical mean Sn in the i.i.d. case.The independence will be replaced by the following assumption :

Assumption 3.2. The limit

∀λ ∈ Rd, Φ(λ) := limn→∞

1n

log E(

exp(n〈λ · Zn〉))∈ R∪ {∞} (3.3)

exists and is finite in a neighborhood of 0. We set DΦ ={

λ ∈ Rd, Φ(λ) < ∞}

. Assume alsothat Φ is lower semi-continuous.

31

32 CHAPTER 3. GÄRTNER-ELLIS THEOREM

The function Φ is convex and its Legendre transform is given by

∀x ∈ Rd, Φ∗(x) = supλ∈Rd

{〈x · λ〉 −Φ(λ)

}with DΦ∗ =

{x ∈ Rd, Φ∗(x) < ∞

}.

(3.4)By construction, Φ∗ is also a convex function (see Proposition 1.6). For the Gärtner-EllisTheorem, we will focus on the points x in Rd where Φ∗ is strictly convex. A point x ∈ Rd

is exposed if there exists v ∈ Rd such that

∀y 6= x, Φ∗(y)−Φ∗(x) >⟨v · (y− x)

⟩. (3.5)

Note that Φ∗ is not necessarily differentiable at x, the vector v ∈ Rd is only a sub-gradient.

Theorem 3.3 (Gärtner-Ellis). Under Assumption 3.2, the sequence {Zn}n > 1 satisfies

• for any closed subset C of Rd

lim supn→∞

1n

log P(Zn ∈ C

)6 − inf

x∈C

{Φ∗(x)

}. (3.6)

• for any exposed point x in Rd (3.5) then

∀ε > 0, lim infn→∞

1n

log P(Zn ∈ B(x, ε)

)> −Φ∗(x), (3.7)

where B(x, ε) = {y ∈ Rd, ‖y− x‖ 6 ε}.

Furthermore, the large deviation principle is good, i.e. that the level sets of Φ∗ are compact.

Cramer’s Theorem 3.1 in Rd can be deduced from the Gärtner-Ellis Theorem. Indeedfor Zn = Sn the independence implies that

∀n > 1,1n

log E(

exp(n〈λ · Zn〉))= log E

(exp(〈λ · X1〉)

)so that Φ(λ) = ϕ(λ) and Φ∗(x) = ϕ∗(x). A straightforward extension of Theorem 1.5shows that the function ϕ∗ is strictly convex and differentiable in the interior of Dϕ∗ . Inparticular for any x in the interior Dϕ∗ , one has

∀y 6= x, ϕ∗(y)− ϕ∗(x) >⟨∇ϕ∗(x) · (y− x)

⟩. (3.8)

The derivative∇ϕ∗(x) can be interpreted in terms of the Legendre transform. Recall thatthe supremum in the variational formula (3.2) is reached at the value λx such that

x = ∇ϕ(λx)

and ϕ∗(x) = 〈x · λx〉 − ϕ(λx).

From the duality Theorem 1.6, ϕ(λx)= supy∈Rd

{〈y · λx〉 − ϕ∗(y)

}and the supremum is

reached at the value x. From this we deduce the relation λx = ∇ϕ∗(x). The values of λxand x are said to be conjugate.

3.2. PROOF OF GÄRTNER-ELLIS THEOREM 33

3.2 Proof of Gärtner-Ellis Theorem

Step 1. Good rate function.By Assumption 3.2, Φ is defined in a neighborhood of 0 so that the same proof as in

Proposition 1.5 implies that Φ∗ is lower semi-continuous and that

limx→∞

Φ∗(x) = ∞. (3.9)

Thus Φ∗ is a good rate function.

Step 2. The compact sets.We first derive the upper bound (3.6) for a compacts set K ⊂ Rd

lim supn→∞

1n

log P(Zn ∈ K

)6 − inf

x∈K

{Φ∗(x)

}. (3.10)

For any δ, we consider a modification of Φ∗

Φ∗δ(x) = min{

Φ∗(x)− δ,1δ

}.

Since Φ∗δ(x) < Φ∗(x), one can find a vector λx such that

〈x · λx〉 −Φ(λx) > Φ∗δ(x). (3.11)

For any x, we consider the half-space

Hδ,x ={

y ∈ Rd, 〈(y− x) · λx〉 > −δ}

,

which is an open neighborhood of x.

C

x

Φ∗

C

x

Hx

Figure 3.1: In the left picture, the set C lies in Rd and the large deviation function Φ∗ is repre-sented in Rd+1. To a given point x in C, one can associate a level set represented in dashed line.The projection of this level set in Rd is depicted on the right and in Hx, the function Φ∗ is largerthan Φ∗(x).

Remark 3.4. The main reason for considering Φ∗δ(x) instead of Φ∗(x) is that the supremum inthe variational representation (3.4) of Φ∗ may not be reached. Suppose that Φ∗ is strictly convex,differentiable and that the supremum in (3.4) is reached at λx = ∇Φ∗(x). In this case, thelimiting half-space when δ tends to 0 is

Hx ={

y ∈ Rd, 〈(y− x) · ∇Φ∗(x)〉 > 0}

.


From the convexity of Φ∗, we see that

∀y ∈ Hx, Φ∗(y) > Φ∗(x) + 〈(y− x) · ∇Φ∗(x)〉 > Φ∗(x).

Thus the large deviation function Φ∗ will reached its minimum inHx at x.

The compact set K can be covered by the union of open half-spaces K ⊂ ⋃x∈KHδ,x.

By compactness, a finite covering can be extracted

K ⊂⋃

i 6 K

Hδ,xi with xi ∈ K.

At this stage, one can mimic the proof of Cramer’s Theorem in R

P(Zn ∈ K

)6 ∑

i 6 KP(Zn ∈ Hδ,xi

)= ∑

i 6 KP(⟨(Zn − xi) · λxi

⟩> −δ

)6 ∑

i 6 Kexp

(nδ− n〈xi · λxi

⟩)E(

exp(

n〈Zn · λxi

⟩) )= ∑

i 6 Kexp

(nδ− n〈xi · λxi

⟩+ nΦn(λxi)

),

where by Assumption 3.2

Φn(λ) =1n

log E(

exp(n〈λ · Zn〉)) n→∞−−−→ Φ(λ).

Recall

Lemma 3.5. Let {a1(n), . . . , aK(n)} be a collection of K sequences diverging asymptotically as

∀i 6 K, lim supn→∞

1n

log ai(n) = −αi

for some {α1, . . . , αK} in RK. Then

lim supn→∞

1n

log

[K

∑i=1

ai(n)

]= −min

i 6 K{αi}.

Using Lemma 3.5 and (3.11), we conclude that

lim supn→∞

1n

log P(Zn ∈ K

)6 δ−min

i 6 K

{〈xi · λxi

⟩−Φ(λxi)

}6 δ−min

i 6 K

{Φ∗δ(xi)

}6 δ−min

x∈K

{Φ∗δ(x)

}.

Letting δ tend to 0, we recover (3.10).

Step 3. Exponential tightness.Building on the previous step (3.10), we are going to complete the derivation of (3.6)

for a general closed set C.

3.2. PROOF OF GÄRTNER-ELLIS THEOREM 35

For any K > 1, we consider the compact set

BK = {x ∈ Rd; ∀i 6 d, |xi| 6 K}.

We will first show that the sequence {Zn} is exponentially tight, i.e. that

lim supn→∞

1n

log P(Zn 6∈ BK

)6 − CK, with lim

K→∞CK = ∞. (3.12)

By Assumption 3.2, there is δ > 0 such that Φ(± δ~ei

)< ∞ for any vector of the

orthonormal basis {~ei}i 6 d. From Chebyshev’s inequality

P(Zn 6∈ BK

)6

d

∑i=1

P(〈Zn ·~ei〉 > K

)+ P

(〈Zn ·~ei〉 6 − K

)6 exp(−nδK)

d

∑i=1

E(

exp(nδ〈Zn ·~ei〉))+ E

(exp(−nδ〈Zn ·~ei〉)

).

We deduce that for any K

lim supn→∞

1n

log P(Zn 6∈ BK

)6 − δK + max

i 6 d

{Φ(± δ~ei

)}.

We turn now to the proof of (3.6). First note that for any K > 0

lim supn→∞

1n

log P(Zn ∈ C

)6 max

{lim sup

n→∞

1n

log P(Zn ∈ C ∩BK

), lim sup

n→∞

1n

log P(Zn 6∈ BK

)}6 max

{− inf

x∈C∩BK

{Φ∗(x)

},−CK

},

where we used (3.10) and (3.12). From (3.9), we know that for any K large enough

infx∈C∩BK

{Φ∗(x)

}= inf

x∈C

{Φ∗(x)

}.

Since CK tends to infinity as K diverges, then (3.6) follows by choosing K large enough.

Step 4. Lower bound.Given x an exposed point in Rd such that Φ∗(x) < ∞, we are going to prove the lower

bound (3.7)

∀ε > 0, lim infn→∞

1n


)> −Φ∗(x).

Let v ∈ Rd be a vector in (3.5) for which

∀y 6= x, Φ∗(y)−Φ∗(x) >⟨v · (y− x)

⟩.

Thanks to Proposition 1.6 and the semi-continuity of Φ, we deduce that

Φ(v) = supy∈Rd

{⟨v · y

⟩−Φ∗(y)

}=⟨v · x

⟩−Φ∗(x) < ∞.


Thus Assumption 3.2 implies that E(

exp(n〈v · Zn〉

))is well defined for n sufficiently

large.

We follow the proof of the lower bound in Cramer’s Theorem 1.1 and consider thetilted measure µn(dz) in Rd defined as

µn(dz) =exp

(n〈v · z〉

)E(

exp(n〈v · Zn〉

))µn(dz), (3.13)

where µn(dz) stands for the measure associated to Zn. We perform the change of measure

P(Zn ∈ B(x, ε′)

)=∫

B(x,ε′)µn(dz) = E

(exp

(n〈v · Zn〉

)) ∫B(x,ε′)

µn(dz) exp (−n〈v · z〉) ,

= exp(−n〈v · x〉+ log E

(exp

(n〈v · Zn〉

))+ nO(ε′)

) ∫B(x,ε′)

µn(dz).

Assuming, for a moment, that the tilded measure concentrates close to x, i.e.

∀ε′ > 0, limn→∞

∫B(x,ε′)

µn(dz) = 1, (3.14)

then we deduce that for any ε′ 6 ε

lim infn→∞

1n


)> − 〈v · x〉+ Φ(v) + O(ε′) > −Φ∗(x) + O(ε′).

Letting ε′ tend to 0, the lower bound (3.7) is completed.

It remains to prove (3.14) which is more delicate than for Cramer’s Theorem, as onecannot rely on the law of large numbers. The idea is to use the large deviation upperbound in order to recover an analogous of the law of large numbers. For this, we computethe Laplace transform associated to µn. Given λ in Rd, we write

∫µn(dz) exp

(n〈λ · z〉

)=∫

µn(dz)exp

(n〈(λ + v) · z〉

)E(

exp(n〈v · Zn〉

)) .

From Assumption 3.2, we deduce that

Φ(λ) = limn→∞

1n

log∫

µn(dz) exp(n〈λ · z〉

)= Φ(λ + v)−Φ(v).

so that the Legendre transform

Φ∗(z) = supλ∈Rd

{〈λ · z〉 −Φ(λ + v) + Φ(v)

}= Φ∗(z)− 〈v · z〉+ Φ(v).

As Φ∗ is the Legendre transform of Φ, one has Φ(v) > 〈v · x〉 −Φ∗(x). This leads to

∀z 6= x, Φ∗(z) = Φ∗(z)−Φ∗(x)−⟨v · (z− x)

⟩> 0,

3.3. CURIE-WEISS MODEL (EPISODE II) 37

where the inequality follows from the fact that x is an exposed point. From the lowersemi-continuity of Φ∗, we obtain that

∀ε′ > 0, infz 6∈Bε′ (x)

Φ∗(z) > 0.

For any ε′ > 0, the upper bound (3.6) implies that

lim supn→∞

1n

log∫

B(x,ε′)µn(dz) 6 − inf

z 6∈Bε′ (x)Φ∗(z) < 0,

which concludes the proof of (3.14).

3.3 Curie-Weiss model (Episode II)

The Curie-Weiss model defines a mean field measure on the set of configurations Σn ={σi}i 6 n in {−1, 1}n depending on the inverse temperature 1/β

µβ,n(Σn)=

1Zβ,n

exp(− β H(Σn)

), (3.15)

with Zβ,n the normalization constant and where the Hamiltonian of the system is

H(Σn) = −1

2n ∑i,j 6 n

σiσj = −n2(Zn)

2 with Zn =1n ∑

i 6 nσi.

Under µβ,n, the variables σi are not independent and one may wonder if the Gärtner-EllisTheorem 3.3 applies. We first check Assumption 3.2

Eµβ,n

(exp(nλZn)

)=

Eνn

(exp(nλZn +

n2 β(Zn)2)

)Eνn

(exp( n

2 β(Zn)2)) ,

where νn is the product Bernoulli measure ⊗ni=1(

12 δ1 +

12 δ−1). Recall that the large devia-

tion function associated to νn is

∀x ∈ [0, 1], U(x) =1 + x

2log(1 + x) +

1− x2

log(1− x).

Applying Varadhan’s Theorem, we deduce the convergence

limn→∞

1n

log Eµβ,n

(exp(nλZn)

)= sup

x∈[−1,1]

{λx +

β

2x2 −U(x)

}− sup

x∈[−1,1]

{β

2x2 −U(x)

}.

Denoting the large deviation function associated to µβ,n by

Iβ(x) =

{U(x)− β

2 x2 − infy∈[−1,1]

{U(y)− β

2 y2}

, x ∈ [−1, 1]

+∞, x 6∈ [−1, 1],


we have shown the validity of assumption 3.2

∀λ ∈ R, Φ(λ) := limn→∞

1n

log Eµβ,n

(exp(nλZn)

)= sup

x∈R

{λx− Iβ(x)

}= I∗β(λ).

(3.16)Thus the upper bound of Gärtner-Ellis Theorem 3.3 applies with the function Φ∗ = I∗∗β

which we now compute :• If β 6 1. We know that Iβ is strictly convex and lower semi-continuous, so that I∗∗β =

Iβ by Proposition 1.6. In particular, Φ∗ = Iβ is strictly convex and any point x ∈]− 1, 1[ isexposed (see (3.5)). The lower bound of Gärtner-Ellis Theorem 3.3 applies and we recoverthat Iβ is the large deviation function associated to µβ,n.

Iβ

mβ−mβ

I∗∗β

mβ−mβ

I∗β

mβ−mβ

Figure 3.2: The graphs of Iβ, I∗∗β and I∗β are depicted. In the center, the graph of Iβ representedin dashed line is above I∗∗β in ] − mβ, mβ[ and coincide outside. As a consequence of the non-convexity of Iβ, the Legendre transform I∗β has a singularity at 0 with left and right derivativesgiven by ±mβ.

• If β > 1. The occurence of a phase transition is characterized by the non-convexity ofIβ in the interval [−mβ, mβ]. As a consequence, we get

∀x ∈]−mβ, mβ[, I∗∗β (x) = 0 < Iβ(x).

Thus the lower bound of Gärtner-Ellis Theorem 3.3 does not apply in ] − mβ, mβ[ andthe large deviation principle can be recovered only for sets in [−1,−mβ] ∪ [mβ, 1] whereI∗∗β = Iβ.

Chapter 4

Sample path large deviations

4.1 Random walk large deviations

4.1.1 Mogulskii’s Theorem

The previous chapters were devoted to the large deviations of the empirical mean of i.i.d.random variables {Xi}i > 1. Under the assumption

∀λ ∈ R, ϕ(λ) = log E(

exp(λX1))< ∞, (4.1)

the large deviation rate function is given by the Legendre transform ϕ∗. We are nowgoing to investigate the large deviations for the sample path

∀t ∈ [0, 1], t→ 1n

bntc∑i=0

Xi.

We define {Zn(t)}t∈[0,1] the continuous version of this process obtained by linear in-terpolation and denote by µn the law of the process {Zn(t)}t∈[0,1] which is supported bythe set of continuous functions

C0([0, 1], R

)={

f ∈ C([0, 1], R

)such that f (0) = 0

}equipped with the supremum topology

‖ f − g‖∞ = sup0 6 s 6 t

{∣∣ f (s)− g(s)∣∣}.

The subset Cabs of absolutely continuous functions will play a key role in the following.We state below two equivalent representations for Cabs

Cabs ={

f ∈ C0([0, 1], R

) ∣∣∣ ∃ f ∈ L1([0, 1]) such that f (t) =∫ t

0ds f (s)

}(4.2)

={

f ∈ C0([0, 1], R

) ∣∣∣ ∀ε > 0, ∃δ > 0 such that for any collection {]ak, bk[}k 6 n of

disjoint open intervals thenn

∑k=1

∣∣bk − ak∣∣ 6 δ ⇒

n

∑k=1

∣∣ f (bk)− f (ak)∣∣ 6 ε

}.

We can now state the sample path large deviation Theorem

39

40 CHAPTER 4. SAMPLE PATH LARGE DEVIATIONS

Theorem 4.1 (Mogulskii). Under the assumption (4.1), the sequence of measures {µn}n > 1obeys a large deviation principle on C0

([0, 1], R

)with rate function

G( f ) =

{∫ 10 ds ϕ∗

(f (s)

), if f ∈ Cabs,

∞, otherwise.(4.3)

Theorem 4.1 is derived in the following sections.

As consequence, the exponential cost is infinite for maintaining the process {Zn(t)}t∈[0,1]close to a function f which is not absolutely continuous.

4.1.2 A local large deviation upper-bound

We are going to prove an upper bound for small neighborhood around functions.

Proposition 4.2. Let f be a function in C0([0, 1], R

), then

limε→0

lim supn→∞

1n

log P(∥∥Zn − f

∥∥∞ 6 ε

)6 − G( f ). (4.4)

The remainder of the section is devoted to the proof of this Proposition. We proceedin 2 steps.

Step 1. Upper bound on finite dimensional marginal distributions.Given k > 1, we consider a finite collection of increasing times T = {t1, . . . , tk}

in [0, 1]k. We will first study the large deviations of the finite dimensional marginals{Zn(t1), . . . , Zn(tk)}.

Lemma 4.3. The finite dimensional marginals{

Zn(t1), . . . , Zn(tk)} obey a large deviation prin-ciple in Rk with rate function

∀{z1, . . . , zk} ∈ Rk, GT(z1, . . . , zk) =k

∑i=1

(ti − ti−1) ϕ∗(

zi − zi−1

ti − ti−1

), (4.5)

where t0 = 0 and z0 = 0.

t1t2

t3

z3

z1

z2

Figure 4.1: The marginals {Zn(ti)}i 6 3 are constrained to be close to zi at time ti.

4.1. RANDOM WALK LARGE DEVIATIONS 41

A first upper bound on the large deviations of the whole path can be obtained as aconsequence of this Lemma

lim supε→0

lim supn→∞

1n


∥∥∞ 6 ε

)6 lim sup

ε→0lim sup

n→∞

1n

log P( k⋂

i=1

{∥∥Zn(ti)− f (ti)∥∥

∞ 6 ε})

6 − GT(

f (t1), . . . , f (tk)). (4.6)

We derive now Lemma 4.3 and we will use (4.6) in the next step.

Proof of Lemma 4.3. Instead of studying the large deviations of{

Zn(t1), . . . , Zn(tk)}, it isequivalent to work with the variables

Yn,T ={

Zn(t1), Zn(t2)− Zn(t1), . . . , Zn(tk)− Zn(tk−1)},

which are almost independent (there can be some small correlations due to the discretiza-tion of the process Zn). Indeed, one has

k⋂i=1

{Zn(ti) = zi

}=

k⋂i=1

{Zn(ti)− Zn(ti−1) = zi − zi−1

}.

To apply the Gärtner-Ellis Theorem 3.3, it is enough to compute for any λ ∈ Rk

Φ(λ) = limn→∞

1n

log E(exp

(n〈λ ·Yn,T〉

))=

k

∑i=1

(ti − ti−1)ϕ(λi). (4.7)

Note that Zn(t) is defined as the linear interpolation of a discrete random walk, thus thevariables {Zn(ti+1) − Zn(ti)}i 6 k may not be independent. However their correlationsare weak and the limit (4.7) holds as if they were independent.

At α = {α1, . . . , αk}, the Legendre transform is given by

Φ∗(α) = supλ∈Rk

{ k

∑i=1

λiαi − (ti − ti−1)ϕ(λi)}=

k

∑i=1

(ti − ti−1) supλi∈R

{λi

αi

ti − ti−1− ϕ(λi)

}=

k

∑i=1

(ti − ti−1)ϕ∗(

αi

ti − ti−1

).

As the function Φ∗ is strictly convex, Gärtner-Ellis Theorem 3.3 implies that the variablesYn,T obey a large deviation principle with rate function Φ∗. By changing variables, therate function GT is recovered and Lemma 4.3 proved.

Step 2. Approximation of the rate functional GFor sample paths Zn close to f in the ‖ · ‖∞-norm, then for arbitrary time discretization

{t1, . . . , tk}, the variables Zn(ti) and f (ti) have to be close. Thus, the upper bound (4.4)will follow from (4.6) provided one can prove the following result.


Lemma 4.4. For any f in C0([0, 1], R

), the large deviation functional G( f ) coincide with the

maximal cost over all the possible discretizations

G( f ) = supk > 1

T={t1,...,tk}

GT(

f (t1), . . . , f (tk)). (4.8)

Proof. Step 1. We will first prove that for any T = {t1, . . . , tk}

GT(

f (t1), . . . , f (tk))6 G( f ). (4.9)

It is enough to suppose that f is absolutely continuous, i.e. f (t) =∫ t

0 ds f (s). We get byusing the convexity of the rate function ϕ∗

ϕ∗(

f (ti)− f (ti−1)

ti − ti−1

)= ϕ∗

(1

ti − ti−1

∫ ti

ti−1

ds f (s))

61

ti − ti−1

∫ ti

ti−1

ds ϕ∗(

f (s))

.

Summing over i leads to

GT(

f (t1), . . . , f (tk))6∫ t

0ds ϕ∗

(f (s)

).

Thus (4.9) holds.

Step 2. For the reverse inequality, we first consider an absolutely continuous functionf (t) =

∫ t0 ds f (s) with f ∈ L1([0, 1]). We are going to prove that for the sequence

Tk = {t1, . . . , tk} with ti =ik

thenlim inf

k→∞GTk

(f (t1), . . . , f (tk)

)> G( f ). (4.10)

We write

GTk

(f (t1), . . . , f (tk)

)=

1k

k

∑i=1

ϕ∗(

k(

f (ik)− f (

i− 1k

)

))=∫ 1

0dt ϕ∗

(k∫ btkc+1

k

btkck

ds f (s)

).

Recall that by Lebesgue Theorem

almost surely in t ∈ [0, 1], f (t) = limk→∞

k∫ btkc+1

k

btkck

ds f (s). (4.11)

Thus using the lower semi-continuity of ϕ∗ and Fatou Lemma, we deduce that

lim infk→∞

GTk

(f (t1), . . . , f (tk)

)>∫ 1

0dt lim inf

k→∞ϕ∗(

k∫ btkc+1

k

btkck

ds f (s)

)

>∫ 1

0dt ϕ∗

(f (t)

)= G( f ).


Thus (4.10) is proved.

It remains to consider the case of a function f which is not absolutely continuous. Weknow from (4.2) that there exists a sequence {a(n)i , b(n)i }i 6 kn with a(n)i < b(n)i such that forsome δ > 0

limn→∞

kn

∑i=1

∣∣b(n)i − a(n)i

∣∣ = 0 and lim infn→∞

kn

∑i=1

∣∣ f (b(n)i )− f (a(n)i )∣∣ > δ. (4.12)

Thus for a family Tn = {t(n)1 , . . . , t(n)k′n} containing the sequence {a(n)i , b(n)i }i 6 kn , one

has

GTn

(f (t(n)1 ), . . . , f (t(n)k′n

))>

kn

∑i=1

(b(n)i − a(n)i ) ϕ∗(

f (b(n)i )− f (a(n)i )

b(n)i − a(n)i

).

From the variational formula

GTn

(f (t(n)1 ), . . . , f (t(n)k′n

))> sup

λ1,...,λkn

{kn

∑i=1

λi

(f (b(n)i )− f (a(n)i )

)− (b(n)i − a(n)i )ϕ (λi)

}

> λkn

∑i=1

∣∣∣ f (b(n)i )− f (a(n)i )∣∣∣− [ sup

|u| 6 λ

ϕ (u)

]kn

∑i=1

(b(n)i − a(n)i ),

where we chose in the last inequality λi = sign( f (b(n)i ) − f (a(n)i ))λ. Assumption (4.1)implies that sup|u| 6 λ ϕ (u) < ∞ for any λ. Letting n→ ∞, we deduce from (4.12) that

GTn

(f (t(n)1 ), . . . , f (t(n)k′n

))> λδ −−−→

λ→∞∞.

Since G( f ) = ∞, the convergence also holds when f is not absolutely continuous.

4.1.3 Exponential tightness

In this section, we improve the upper bound (4.4) and show that for any closed set C inC0([0, 1], R

)lim sup

n→∞

1n

log P(

Zn ∈ C)6 − inf

f∈C

{G( f )

}. (4.13)

As in Gärtner-Ellis Theorem 3.3, the strategy relies on a decomposition of the closed setC into a compact set and a remaining part which will be controlled by the exponentialtightness property.

The key step is to derive the exponential tightness.

Proposition 4.5. The sequence of measures {µn}n > 1 is exponentially tight in C0([0, 1], R

).

We postpone the proof of Proposition 4.5 and first complete the derivation of (4.13).From Proposition 4.5, one can find for any K > 0 a compact set K such that

lim supn→∞

1n

log P(

Zn 6∈ K)6 − K. (4.14)


Given K, we are going to prove that

lim supn→∞

1n

log P(

Zn ∈ C ∩ K)6 − inf

f∈C∩K

{G( f )

}. (4.15)

Fix δ > 0. For any f in C ∩ K, Proposition 4.2 implies that there exists ε f > 0 such that

lim supn→∞

1n


∥∥∞ < ε f

)6 − G( f ) + δ.

Since C ∩ K is compact, a finite covering by open sets can be extracted

C ∩ K ⊂K⋃

i=1

{g ∈ C(1)

0

([0, 1], R

)such that

∥∥g− f∥∥

∞ < ε f

}.

Thus, we get

lim supn→∞

1n

log P(

Zn ∈ C ∩ K)6 − inf

1 6 i 6 K

{G( fi)

}+ δ 6 − inf

f∈C∩K

{G( f )

}+ δ.

Letting δ tend to 0, this completes (4.15). To conclude the proof of (4.13), it is enough tolet K go to infinity in (4.14).

We turn now to the proof of Proposition 4.5. The first step is to identify the com-pact sets on C0

([0, 1], R

)equipped with the supremum topology. A subset H of C0 is

equicontinuous if

∀ε > 0, ∃δ > 0 such that |x− y| 6 δ ⇒ supf∈H

{∣∣ f (x)− f (y)∣∣} 6 ε. (4.16)

As the functions in C0 satisfy f (0) = 0, Ascoli’s Theorem provides a characterization forcompact sets

Theorem 4.6 (Ascoli). An equicontinuous subset H of C0 is relatively compact in C0, i.e. itsclosure is compact.

Using this characterization of compact sets, we turn now the proof of the exponentialtightness.

Proof of Proposition 4.5. Let us start with a simple, but instructive, example and supposefirst that the increments of the process are uniformly bounded |Xi| 6 C for some C > 0,then

∀s, t ∈ [0, 1],∣∣Zn(t)− Zn(s)

∣∣ 6 C|t− s|.Thus any sample path {Zn(t)}t∈[0,1] belongs to the set of Lipschitz functions with Lips-chitz constant C. As this set is equicontinuous, we deduce from Ascoli’s Theorem that µnis supported by a compact set and the exponential tightness property of Proposition 4.5holds.

More generally, one expects that the regularity of the trajectories is controlled by theasymptotic behavior of ϕ∗ since the large deviations are controlled by G( f ) =

∫ 10 ds ϕ∗

(f (s)

).


For example, if asymptotically ϕ∗(x) behaves as |x|p for some p > 1, then any functionfor which G( f ) is finite will be Hölder continuous of order p−1

p as f is in Lp([0, 1]). Indeed,we deduce from Hölder inequality that

| f (t)− f (s)| =∣∣∣∣∫ s

tdu f (u)

∣∣∣∣ 6 |t− s|1−1/p(∫ s

tdu| f (u)|p

)1/p

6 ‖ f ‖Lp |t− s|(p−1)/p .

(4.17)As a consequence, for any M > 0, the set of functions { f , G( f ) 6 M} is equicontinuousand therefore relatively compact. The following proof rests on this intuition.

Assumption (4.1) implies that ϕ∗ diverges faster than linearly. Indeed for any λ > 0

ϕ∗(x) > max{

λx− ϕ(λ),−λx− ϕ(−λ)}

so that by choosing λ arbitrarily large, we deduce that

limx→±∞

ϕ∗(x)x

= ∞. (4.18)

Thus one can find two sequences {δk, εk}k > 1 such that

limk→∞

δk = limk→∞

εk = 0 and δk ϕ∗(± εk

δk+ E(X1)

)> k. (4.19)

We define the compact sets for any integer k

Kk ={

f ∈ C0([0, 1], R

), ∀` > k, |x− y| 6 δ` ⇒ | f (x)− f (y)| 6 ε`

}. (4.20)

Since sequences {δ`, ε`}` > 1 are vanishing, the sets Kk are equicontinuous and thereforecompact. The regularity of the functions in Kk is tuned in (4.19) in order to be compatiblewith the regularity imposed by ϕ∗. For example, if ϕ∗ behaves as |x|p then one expectsδ

1−pk ε

pk = k so that one has a regularity as in (4.17) : εk = k1/pδ

(p−1)/pk . Note that the extra

factor k1/p has been chosen to simplify the proof later on.Deriving the exponential tightness boils down to prove that

limk→∞

lim supn→∞

1n

log P(

Zn 6∈ Kk

)= −∞.

We get

P(

Zn 6∈ Kk

)6 P

(∃` > k, sup

t∈[0,1]sup

0 6 s 6 δ(`)

|Zn (t + s)− Zn (t)| > ε`

).

Below the scale 1/n, the sample paths are linear, thus one has to distinguish the largescale regularity from the small scale regularity. We set

`n = inf{` ∈N, δ` 6 1/n

}.

As the sequence { ε`δ`}` > 1 diverges, one has, for any fixed n,

supi 6 n

|Xi|n

6ε`n

δ`n

⇒ ∀` > `n, supt∈[0,1]

sup0 6 s 6 δ(`)

|Zn (t + s)− Zn (t)| 6 ε`.


Thus the constraints on the scales smaller than δ`n are not relevant. This leads to

P(

Zn 6∈ Kk

)6 P

(∃` 6 `n − 1, ∃i 6 n, sup

0 6 j 6 δ`n

∣∣∣∣Zn

(i + j

n

)− Zn

(in

)∣∣∣∣ > ε`

)+ P

(supi 6 n

|Xi|n

>ε`n

δ`n

)6 n

`n−1

∑`=k

P(

sup0 6 j 6 δ`n

∣∣∣∣Zn

(jn

)∣∣∣∣ > ε`

)+ nP

(|X1| > n

ε`n

δ`n

), (4.21)

where we used that Zn

(i+jn

)− Zn

( in

)has the same law as Zn

(jn

). The last term can be

easily estimated by

P(|X1| > n

ε`n

δ`n

)6 exp

(− ϕ∗

(n

ε`n

δ`n

)),

which vanishes super-exponentially fast as ϕ∗ grows faster than linearly.The following Lemma will be key to control the fluctuations of the trajectories.

Lemma 4.7. For any u > 0, one has for any n > 1

P

(supk 6 n

∣∣Sk∣∣ > nu

)6 exp

(− Nϕ∗(u + E(X1))

)+ exp

(− nϕ∗(−u + E(X1))

),

with Sk = ∑ki=1 Xi − kE(X1).

We postpone the derivation of Lemma 4.7 and conclude first the proof of Proposition4.5. Thanks to Lemma 4.7, we deduce from (4.21) with N = δ`n

P(

Zn 6∈ Kk

)6 n ∑

` > kexp

(−nδ`ϕ∗

(± ε`

δ`+ E(X1)

))+ exp

(− ϕ∗

(n

ε`n

δ`n

))6 Cn exp(−nk) + exp

(− ϕ∗

(n

ε`n

δ`n

)),

where we used (4.19) in the last inequality. This implies the exponential tighness andconcludes the proof of Proposition 4.5.

Proof of Lemma 4.7. Since the variables Sk have been centered, we set

ψ(λ) = log E(

exp(λ(X1 −E(X1)

))= ϕ(λ)− λE(X1) and ψ∗(x) = ϕ(x−E(X1)).

First note that

P

(supk 6 N

∣∣Sk∣∣ > Nδ

)6 P

(supk 6 N

Sk > Nδ

)+ P

(supk 6 N

Sk 6 − Nδ

).

Both terms are of the same nature and it is enough to consider the first one. From Cheby-shev’s inequality, we get for any λ > 0

P

(supk 6 N

Sk > Nδ

)6 P

(supk 6 N

exp(λSk

)> exp(λNδ)

)

6 P

(supk 6 N

Mk > exp(

N(λδ− ψ(λ))))


whereMk = exp

(λSk − kψ(λ)

),

is a martingale and we used that ψ(λ) > 0 for λ > 0.We recall Doob’s inequality for a non negative martingale {Mk}k > 1

P

(supk 6 N

Mk > c

)6

1c

E(

MN1{supk 6 N |Mk | > c})

61c

E (MN) =1c

E (M1) . (4.22)

Since E(M1) = 1, we deduce that

P

(supk 6 N

Sk > Nδ

)6 exp

(− N(λδ− ψ(λ))

).

Optimizing with respect to λ, we conclude Lemma 4.7.

4.1.4 Proof of the lower bound

We turn now to the derivation of the lower bound

∀ε > 0, lim supn→∞

1n


∥∥∞ 6 ε

)> − G( f ). (4.23)

It is enough to suppose that G( f ) < ∞. Thus, we choose f absolutely continuous andthere exists f in L1([0, 1]) such that f (t) =

∫ t0 ds f (s). For any ε > 0, one can find K ∈ N

and an increasing time sequence {t1, . . . , tK} such that

∀i 6 K,∫ ti

ti−1

ds∣∣ f (s)∣∣ 6 ε

3so that ∀s ∈ [ti−1, ti], | f (s)− f (ti−1)| 6

ε

3,

(4.24)and

∀i 6 K, |ti − ti−1| 6δ(ε)

3, (4.25)

for some function δ(ε) satisfying the following condition (see (4.19))

limε→0

δ(ε) = 0 and limε→0

δ(ε)ϕ∗(± ε

δ(ε)

)= ∞. (4.26)

For s in [ti−1, ti], one has∣∣Zn(s)− f (s)∣∣ 6 ∣∣Zn(s)− Zn(ti−1)

∣∣+ ∣∣Zn(ti−1)− f (ti−1)∣∣+ ∣∣ f (s)− f (ti−1)

∣∣.The oscillations of f in each interval [ti−1, ti] are controlled by (4.24), thus if Zn(ti−1) −f (ti−1) are close and that Zn does not fluctuate too much in [ti−1, ti], then both trajectorieswill remain close{∣∣Zn(ti−1)− f (ti−1)

∣∣ 6 ε

3

}⋂{∀s ∈ [ti−1, ti],

∣∣Zn(ti−1)− Zn(s)∣∣ 6 ε

3

}⊂{∀s ∈ [ti−1, ti],

∣∣Zn(s)− f (s)∣∣ 6 ε

}.


Thus

P(∥∥Zn − f

∥∥∞ 6 ε

)> P

( K⋂i=1

{∣∣Zn(ti−1)− f (ti−1)∣∣ 6 ε

3

})−P

( K⋂i=1

{∀s ∈ [ti−1, ti],

∣∣Zn(ti−1)− Zn(s)∣∣ > ε

3

}}).

We will first check that the small scale fluctuations are vanishing super-exponentiallyfast when ε tends to 0. We claim that for i 6 K

lim supn→∞

1n

log P(

sups∈[ti−1,ti ]

∣∣Zn(s)− Zn(ti−1)∣∣ > ε

3

)6 − δ(ε)ϕ∗

(± ε

δ(ε)

).

This inequality is a consequence from Lemma 4.7 and the divergence when ε tends to 0follows from (4.26). Note that, once again, the control of the small scale fluctuations issimpler when the increments of the walk are bounded (|Xi| 6 C for some C > 0). Thusthe fluctuations of Zn are irrelevant and for ε small enough

lim infn→∞

1n


∥∥∞ 6 ε

)> lim inf

n→∞

1n

log P( K⋂

i=1

{∣∣Zn(ti−1)− f (ti−1)∣∣ 6 ε

3

}).

The lower bound inequality (4.23) is a consequence of Proposition 4.3 for the devia-tions of finite dimensional marginals

lim infn→∞

1n


∥∥∞ 6 ε

)> − GT

(f (t1), . . . , f (tK)

)> − G( f ).

Note that the sampling t1, . . . , tK depends on ε.

4.2 Schilder’s Theorem

Consider the rescaled Brownian motion

∀t ∈ [0, 1], Bn(t) =1√n

B(t) (4.27)

and denote by νn the corresponding path measure in C0([0, 1], R

). When n tends to in-

finity, the paths Bn converge almost surely to 0. We are going to investigate the largedeviations of these paths.

Theorem 4.8 (Schilder). The sequence of measures {νn}n > 1 obeys a large deviation principlein C0

([0, 1], R

)with speed n and rate function

G( f ) =

{12

∫ 10 ds f (s)2, if f ∈H1,

+∞, otherwise,(4.28)

whereH1 =

{f ; ∀t ∈ [0, 1], f (t) =

∫ t

0ds f (s) with f ∈ L2([0, 1])

}.

4.2. SCHILDER’S THEOREM 49

The proof will follow from Theorem 4.1 by comparing the rescaled Brownian motionto a random walk.

Proof. We are going to discretize the trajectory with a mesh 1n

Bn(t) =1√n

bntc∑k=1

∆k +1√n

(B(t)− B

(bntcn

)),

where the increments are of the form

∆k = B(

kn

)− B

(k− 1

n

)=

1√n

Xk

and Xk are independent Gaussian variables N (0, 1).Thus the rescaled Brownian motion coincides at any time k

n with the process Zn

Bn

(nkn

)=

1n

bntc∑i=0

Xi = Zn

(nkn

).

The latter follows a large deviation principle with rate function (4.28) according to Mogul-skii’s Theorem 4.1 (recall that for the Gaussian variables ϕ∗(x) = x2/2). Thus it is enoughthat Bn and Zn remain close on [0, 1] with very high probability, i.e.

∀δ > 0, lim supn→∞

1n

log P (‖Zn − Bn‖∞ > δ) 6 −∞. (4.29)

Since the process Bn and Zn coincide at times kn

P (‖Zn − Bn‖∞ > δ) 6n

∑k=1

P

supk−1

n 6 s 6 kn

∣∣Zn(s)− Bn (s)∣∣ > δ

,

it will be enough to estimate the fluctuations of each process in the time intervals [ k−1n , k

n ].As Zn is linear on [ k−1

n , kn ], we get

P

supk−1

n 6 s 6 kn

∣∣Zn(s)− Zn

(kn

) ∣∣ > δ

= P

(∣∣ 1n

X kn

∣∣ > δ

)6 exp

(−n2 δ2

2

),

by using the fact that the increments are Gaussian.For the Brownian motion, the counterpart of Lemma 4.7 is

Lemma 4.9. For any integer n and u, τ > 0, one has

P

(sup

0 6 t 6 τ

∣∣Bn(t)∣∣ > u

)6 4 exp

(− n

u2

2τ

).

Applying the Lemma with τ = 1/n leads to the superexponential decay

P

supk−1

n 6 s 6 kn

∣∣∣Bn(s)− Bn

(kn

) ∣∣∣ > δ

6 4 exp(− n2 u2

2).

This completes the upper bound (4.29).

Bibliography

[1] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities.Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence,With a foreword by Michel Ledoux.

[2] Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38.Springer Science & Business Media, 2009.

[3] Frank Den Hollander. Large deviations, volume 14. American Mathematical Soc., 2008.

[4] Michel Ledoux. The concentration of measure phenomenon, volume 89 of MathematicalSurveys and Monographs. American Mathematical Society, Providence, RI, 2001.

51

Large deviations : theory and applications VERSION ... · If the variables have a ﬁnite variance...

Documents

Transcript of Large deviations : theory and applications VERSION ... · If the variables have a ﬁnite variance...