0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour...

157
1 Todo Lists 0.0.1 Recommandations typographiques 1. Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com- mande \ensemble{x \in \Xset}{x^2\leq 2} ce qui donne: x X : x 2 2 . 2. Si on veut ecrire X = Y P-a.s., c est a dire si on veut mettre la bonne distance avant le ”presque sur”, on ecrit: X=Y \eqspp \as[\PP] 3. Dans une preuve, on ecrit (i) mldkjfdsq mqsldfjlksdq mqlsdjkfds mldjflmksdqn mlkjdfmlqsd mlkj mlkj mlkjmk mlkjf slmkj mlkmlkjn mlkj mlkjlkj mlkjmlk mlk lmkjm kljmlkj kljkjml mlkjkjml mlkj lkm kl,lmk (ii) nfmlksdq mqljfds mlqs dfdsm mqlskdf dsqfdqsf mlqksdfn dsml fmqlkdfsdq fld mlqkf dq fk dqfmldkqf qdfmlkq dfdkl \begin{enumerate}[(i), wide=0pt, labelindent=\parindent] 4. Je pense qu il faudra remplacer +par . Je ferai quand je pourrais!! I Done! 5. On dit ”Lebesgue’s theorem” et non pas ”the Lebesgue theorem”. Pour les th´ eor` emes, le fichier Definitions/theorem_names.tex et´ e cr´ e. Utiliser la commande \nom_du_theoreme pour obtenir une graphie uniformis´ ee bas´ ee sur wikipedia anglais. I Randal’s todo list En pr´ evision: 1. Mettre exemples MCMC partout ou c est possible. 2. Exemple elabore en geometrique mais pas trop de calculs. 3. Lire la preuve d’Hairer de ”Strong-Feller + un etat accessible” implique unicite. Je crois que c est tres simple, sans utiliser l irreductibilite et voir ce qu on peut en faire... I Done! 4. Mettre slice sampler dans le bouquin apres le resultat d’Hairer, dans la section ou il y a le noyau double. 5. Dropbox: Chacon Ornstein (Pierre). 6. Lire et remodifier Chapitre 12: c est pas evident a lire comme ca... 7. Reprendre completement le Chapitre 9!

Transcript of 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour...

Page 1: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1

Todo Lists

0.0.1 Recommandations typographiques

1. Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande

\ensemblex \in \Xsetx^2\leq 2

ce qui donne:

x ∈ X : x2 ≤ 2

.2. Si on veut ecrire X = Y P-a.s., c est a dire si on veut mettre la bonne distance

avant le ”presque sur”, on ecrit:

X=Y \eqspp \as[\PP]

3. Dans une preuve, on ecrit

(i) mldkjfdsq mqsldfjlksdq mqlsdjkfds mldjflmksdqn mlkjdfmlqsd mlkj mlkjmlkjmk mlkjf slmkj mlkmlkjn mlkj mlkjlkj mlkjmlk mlk lmkjm kljmlkj kljkjmlmlkjkjml mlkj lkm kl,lmk

(ii) nfmlksdq mqljfds mlqs dfdsm mqlskdf dsqfdqsf mlqksdfn dsml fmqlkdfsdqfld mlqkf dq fk dqfmldkqf qdfmlkq dfdkl

\beginenumerate[(i), wide=0pt, labelindent=\parindent]

4. Je pense qu il faudra remplacer +∞ par ∞. Je ferai quand je pourrais!! I Done!5. On dit ”Lebesgue’s theorem” et non pas ”the Lebesgue theorem”.

• Pour les theoremes, le fichier Definitions/theorem_names.tex a ete cree. Utiliserla commande \nom_du_theoreme pour obtenir une graphie uniformisee basee surwikipedia anglais.

I Randal’s todo list

En prevision:

1. Mettre exemples MCMC partout ou c est possible.2. Exemple elabore en geometrique mais pas trop de calculs.3. Lire la preuve d’Hairer de ”Strong-Feller + un etat accessible” implique unicite.

Je crois que c est tres simple, sans utiliser l irreductibilite et voir ce qu on peuten faire... I Done!

4. Mettre slice sampler dans le bouquin apres le resultat d’Hairer, dans la sectionou il y a le noyau double.

5. Dropbox: Chacon Ornstein (Pierre).6. Lire et remodifier Chapitre 12: c est pas evident a lire comme ca...7. Reprendre completement le Chapitre 9!

Page 2: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2

8. Lire Blackwell dans chapitre 3. J ai commence.9. Ajouter exemple Observation Driven.

10. Chapitre 1: Remplacer le Gibbs par le Data Augmentation. I Done!11. Chapitre 3: pour les chaines discretes, se poser la question identification de la

variance et vitesse dans la convergence.12. Chapitre 2: relecture. J ai mis dans Definition 2.1 random variable plutot

qu’application. I Done!

Quelqu’un peut il regarder:

Si qq’n l’a lu, un petit commentaire juste apres la question ci dessous serait super!

1. La section ?? si la presentation du slice et les preuves sont claires ou pas?2. J ai modifie la definition de la V-geometric ergodicite. J ai enleve le fait que π

est stationnaire car c est implique par l inegalite. J ai mis un commentaire endessous. I Definition 7.18

3. Y a une preuve de ouf signe Hairer en Theorem 10.15: vous pouvez regarder sic est convaincant?

4. J arrive pas bien a comprendre Theorem 11.29: ok j ai modifie la redaction...

I Eric’s todo list

I Philippe’s todo list

Page 3: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Topics in Markov Chains

February 18, 2015

Springer

Page 4: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x
Page 5: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Contents

1 Markov Chains: Basic definitions and examples . . . . . . . . . . . . . . . . . . . 11.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Homogeneous Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 The Canonical Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Invariant Measures and Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.6 Reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.7 Topological properties of kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.8 Markov chains of order p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.9 Discrete State Space Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.10 Time Series Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.11 Markov Chain Monte Carlo Examples . . . . . . . . . . . . . . . . . . . . . . . . . 361.12 Two-stage Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2 The strong Markov property and its applications . . . . . . . . . . . . . . . . . . 452.1 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.2 The Shift operator and the Markov property . . . . . . . . . . . . . . . . . . . . 472.3 Potential kernel, harmonic and superharmonic functions . . . . . . . . . . 522.4 The Dirichlet and Poisson problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.5 Riesz decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.6 The Dynkin Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3 Discrete State-Space Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.1 Recurrence, Transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.2 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.3 Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.4 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.5 Recurrent irreducible Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . 793.6 Drift conditions for recurrence and transience . . . . . . . . . . . . . . . . . . . 833.7 Convergence of the iterates of the Markov kernel . . . . . . . . . . . . . . . . 86

v

Page 6: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

vi Contents

3.8 Convergence of the Markov kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.9 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.10 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4 Discrete-time renewal theorems and rates of convergence withapplications to discrete Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.1 Discrete-time renewal process: definition . . . . . . . . . . . . . . . . . . . . . . . 1074.2 Forward and backward recurrence time chains . . . . . . . . . . . . . . . . . . 1094.3 Blackwell theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.4 Coupling time bounds for renewal sequences . . . . . . . . . . . . . . . . . . . 1144.5 Rates of convergence in discrete renewal theorems . . . . . . . . . . . . . . . 1244.6 Regularity and solidarity for discrete state Markov chains . . . . . . . . . 125

5 Limit theorems for stationary ergodic Markov chains . . . . . . . . . . . . . . 1295.1 Dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.2 Markov chains ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.3 Central Limit Theorems for Additive Functionals . . . . . . . . . . . . . . . . 1365.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6 Uniformly ergodic Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.1 Total Variation Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.2 Fixed-point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1526.3 Dobrushin coefficient and uniform ergodicity . . . . . . . . . . . . . . . . . . . 1546.4 A coupling proof of uniform ergodicity . . . . . . . . . . . . . . . . . . . . . . . . 1576.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

7 V-geometrically ergodic Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 1677.1 V-total variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1677.2 V -Dobrushin coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1697.3 Drift and Minorization conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.4 Quantitative bounds for the V -Dobrushin coefficient . . . . . . . . . . . . . 1727.5 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767.6 Coupling bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1787.7 Coupling for stochastically monotone Markov chains . . . . . . . . . . . . 1867.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

7 Subgeometrically ergodic Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . 1937.1 Subgeometric rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1937.2 Subgeometric rate of convergence under a drift condition . . . . . . . . . 1947.3 Moment bounds under the subgeometric drift condition . . . . . . . . . . . 1957.4 Coupling bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1987.5 Proof of Theorem 7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2017.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

Page 7: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Contents vii

8 Iterative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2098.1 Pathwise Lipshitz Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2108.2 Lp Lipshitz models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2168.3 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2208.4 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2238.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

9 Geometric convergence in Wasserstein distance . . . . . . . . . . . . . . . . . . . . 2299.1 The Vasershtein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2299.2 Geometric convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

10 Ergodicity of Markov Chains on metric spaces . . . . . . . . . . . . . . . . . . . . 23710.1 Feller Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23710.2 Ultra Feller kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24210.3 Asymptotically ultra-Feller chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24410.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

11 Irreducibility, small sets and aperiodicity . . . . . . . . . . . . . . . . . . . . . . . . . 25511.1 Irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25511.2 Atom and Small sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26111.3 Regular sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26411.4 Petite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26611.5 Aperiodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26911.6 Irreducible aperiodic Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . 27111.7 T-chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27411.8 Proof of Theorem 11.24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280

12 Recurrence and Transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28512.1 Recurrence and Transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28512.2 Drift criterion for transience and recurrence . . . . . . . . . . . . . . . . . . . . . 28912.3 Harris recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29112.4 Evanescence on locally compact space . . . . . . . . . . . . . . . . . . . . . . . . . 29412.5 Topological recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29612.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

13 Invariant Measure under irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 30713.1 Existence of invariant measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30713.2 The Orey Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

14 Regenerative decomposition for Harris Chains . . . . . . . . . . . . . . . . . . . . 31914.1 Regenerative decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31914.2 Positive and Null Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32514.3 Drift conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33114.4 Convergence for integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . 33414.5 Polynomial ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34114.6 Geometric ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

Page 8: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

viii Contents

15 Reversible Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35115.1 Operator theory essentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35115.2 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35515.3 Spectral measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35715.4 Geometric ergodicity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35915.5 Reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36015.6 Spectral gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36115.7 Central Limit Theorem for reversible Markov chains . . . . . . . . . . . . . 36315.8 Ordering the asymptotic covariances . . . . . . . . . . . . . . . . . . . . . . . . . . 367

16 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36916.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36916.2 Conditional independence and mixing . . . . . . . . . . . . . . . . . . . . . . . . . 37416.3 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

Appendices

Appendix A Topology, Measure and probability . . . . . . . . . . . . . . . . . . . . . 381A.1 Some topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381A.2 Some measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382A.3 Some probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386

Appendix B Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391B.1 Probability on metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391B.2 Weak* convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392B.3 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393B.4 Prohorov distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395B.5 Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397B.6 Strassen’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

Appendix C Metrics and Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . 401C.1 β -distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401C.2 γ-distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402C.3 Weighted γ-distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405C.4 Wasserstein distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

Appendix D Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413D.1 Definitions and Elementary properties . . . . . . . . . . . . . . . . . . . . . . . . . 413D.2 Doob decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414D.3 Martingale inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415D.4 Martingale convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417D.5 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418D.6 Central Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

Page 9: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Frequently used notations

Sets and Numbers

• N: the set of natural numbers including zero, N= 0,1,2, . . ..• N∗: the set of natural numbers excluding zero, N∗ = 1,2, . . ..• N: the extended set of natural numbers, N= N∪∞.• Z: the set of relative integers, Z= 0,±1,±2, . . ..• R: the set of real numbers.• Rd : Euclidean space consisting of all column vectors x = (x1, . . . , xd)

′.• R: the extended real line, i.e. R∪−∞,∞.• dxe: the smallest integer bigger than or equal to x.• bxc: the largest integer smaller than or equal to x.• if a = a(n),n ∈ Z and b = b(n),n ∈ Z are two sequences, a∗b denote the

convolution of a and b, defined formally by a ∗ b(n) = ∑k∈Z a(k)b(n− k). Thej-th power of convolution of the sequence a is denoted a∗ j with a∗0(0) = 1and a∗0(k) = 0 if k 6= 0.

Metric space

• (X,d) a metric space.• B(x,r) the open ball of radius r > 0 centred in x,

B(x,r) = y ∈ X : d(x,y) < r .

• U closure of the set U ⊂ X.• ∂U boundary of the set U ⊂ X.

ix

Page 10: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

x Contents

Binary relations

• a∧b the minimum of a and b• a∨b the maximum of a and b• a(n) b(n) the ratio of the two sides is bounded from above and below by

positive constants that do not depend on n• a(n)∼ b(n) the ratio of the two sides converges to one

Vectors, matrices

• Md(R) (resp. Md(C)) the set of d×d matrices with real (resp. complex) coef-ficients.

• for M ∈Md(C), and |·| any norm on Cd , 9M9 is the operator norm, defined as

9M9 = sup‖Mx‖‖x‖

, x ∈ Cd , x 6= 0

.

• Id d×d identity matrix.• Let A and B be a m× n and p× q matrices, respectively. Then, the Kronecker

product A⊗B of A with B is the mp×nq matrix whose (i, j)’th block is the p×qAi, jB, where Ai, j is the (i, j)’th element of A. Note that the Kronecker productis associative (A⊗B)⊗C = A⊗ (B⊗C) and (A⊗B)(C⊗B) = (AC⊗BD) (formatrices with compatible dimensions).

• Let A be an m× n matrix. Then Vec(A) is the (mn× 1) vector obtained fromA by stacking the columns of A (from left to right). Note that Vec(ABC) =(CT ⊗A)Vec(B).

Functions

• 1A indicator function with 1A(x) = 1 if x ∈ A and 0 otherwise. 1A is used ifA is a composite statement

• f+: the positive part of the function f , i.e. f+(x) = f (x)∨0,• f−: the negative part of the function f , i.e. f−(x) =−( f (x)∧0).• f−1(A): inverse image of the set A by f .• For f a real valued function on X, | f |∞ = sup f (x) : x ∈ X is the supremum

norm and osc ( f ) is the oscillation seminorm , defined as

osc ( f ) = sup(x,y)∈X×X

| f (x)− f (y)|= 2 infc∈R| f − c|∞ . (0.1)

• A nonnegative (resp. positive) function is a function with values in [0,∞] (resp.(0,∞]).

Page 11: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Contents xi

• A nonnegative (resp. positive) real-valued function is a function with values in[0,∞) (resp. (0,∞)).

Function spaces

Let (X,X ) be a measurable space.

• F(X,X ): the vector space of measurable functions from (X,X ) to (−∞,∞).• F+(X,X ): the cone of measurable functions from (X,X ) to [0,∞].• Fb(X,X ): the subset of F(X,X ) of bounded functions.• For any ξ ∈Ms(X ) and f ∈ Fb(X,X ), ξ ( f ) =

∫f dξ .

• Any ξ ∈Ms(X ) defines a linear functional on the Banach space (Fb(X,X ), | ·|∞).• If X is a topological space,

– Cb(X) is the space of all bounded continuous real functions defined on X;– C(X) is the space of all continuous real functions defined on X;– Ub(X) is the space of all bounded uniformly continuous real functions de-

fined on X;– U(X) is the space of all uniformly continuous real functions defined on X;– Lipb(X) is the space of all bounded Lipschitz real functions defined on X;– Lip(X) is the space of all Lipschitz real functions defined on X;

• L p(µ): the space of measurable functions f such that∫| f |pdµ < ∞.

• Lp(µ): the space of classes of µ-equivalent functions in L p(µ). If f ∈ Lp(µ),‖f‖p = (

∫| f |pdµ)1/p where f ∈ f. When no confusion is possible, we will iden-

tify f and any f ∈ f.

Measures

Let (X,X ) be a measurable space.

• δx Dirac measure with mass concentrated on x, i.e. δx(A) = 1 if x ∈ A and 0otherwise.• Leb: Lebesgue measure on Rd

• Ms(X ) is the set of finite signed measures on the measurable space (X,X ).• M+(X ) is the set of measures on the measurable space (X,X ).• M1(X ) denotes the set of probabilities on (X,X ).• M0(X ) the set of finite signed measures ξ on (X,X ) satisfying ξ (X) = 0.• µ ν : µ is absolutely continuous w.r.t. ν .

If X is a topological space (in particular a metric space) then X is always takento be the Borel sigma-field generated by the topology of X. If X = Rd , its Borelsigma-field is denoted by B

(Rd).

Page 12: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

xii Contents

• supp(µ) : the (topological) support of a measure µ on a metric space.• µn

w⇒ µ: The sequence of probability measures µn, n ∈ N converges weaklyto µ , i.e. for any h ∈ Cb(X), limn→∞ µn(h) = µ(h).

The topological space X is locally compact if every point x ∈ X has a compactneighborhood.

• C0(X): the Banach space of continuous functions that vanish at infinity.• µn

∗w⇒ µ: The sequence of σ -finite measures µn, n ∈ N converges to µ

*weakly, i.e. limn→∞ µn(h) = µ(h) for all h ∈ C0(X).

Probability space

Let (Ω ,F ,P) be a probability space. A random variable X is a measurable mappingfrom (Ω ,F ) to (X,X ).

• E(X), E [X ]: the expectation of a random variable X with respect to the proba-bility P.• Cov(X ,Y ) covariance of the random variables X and Y .• Given a sub-σ -field F and A ∈A , P(A |F ) is the conditional probability of A

given F and E [X |F ] is the conditional expectation of X given F .• LP (X): the distribution of X on (X,X ) under P, i.e. the image of P under X .

• XnP

=⇒ X the sequence of random variables (Xn) converge to X in probabilityunder P.• Xn

P-prob−→ X the sequence of random variables (Xn) converge to X in probabilityunder P.• Xn

P-a.s.−→ X the sequence of random variables (Xn) converge to X P-almost surely.

Usual distributions

• B(n, p): Binomial distribution of n trial with success probability p.• N(µ,σ2): Normal distribution with mean µ and variance σ2.• U(a,b): uniform distribution of [a,b].• χ2: chi-square distribution.• χ2

n : chi-square distribution with n degrees of freedom.

Page 13: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Chapter 1Markov Chains: Basic definitions and examples

1.1 Markov chains

Definition 1.1 (Stochastic Process) Let (Ω ,F ,P) be a probability space and (X,X )be a measurable space.

(i) A sequence Xk, k ∈N of random variables with values in (X,X ) is called anX-valued stochastic process.

(ii) A filtration of a measurable space (Ω ,F ) is an increasing sequence Fk, k ∈N of sub-σ -fields of F . A filtered probability space (Ω , F ,Fk, k ∈ N,P)is a probability space endowed with a filtration.

(iii) A stochastic process Xk, k ∈N is said to be adapted to the filtration Fk, k ∈N if for each k ∈ N, Xk is Fk-measurable.

The notation (Xk,Fk), k ∈N will be used to indicate that the process Xk, k ∈Nis adapted to the filtration Fk, k ∈ N. The σ -field Fk can be thought of as theinformation available at time k. Requiring the process to be adapted means thatprobabilities related to Xk can be computed using solely the information available attime k.

Definition 1.2 (Natural filtration) Let (Ω ,F ,P) be a probability space and Xk, k∈N be a stochastic process. The natural filtration of the process Xk, k ∈ N is thefiltration F X

k , k ∈ N defined by

F Xk = σ(X j : 0≤ j ≤ k) , k ∈ N .

By definition, a stochastic process is adapted to its natural filtration. The main defi-nition of this chapter can now be stated.

Definition 1.3 (Markov Chain) Let (Ω ,F ,Fk, k ∈N,P) be a filtered probabil-ity space. An adapted stochastic process (Xk,Fk), k ∈N is a Markov chain if, forall k ∈ N and A ∈X ,

P(Xk+1 ∈ A |Fk) = P(Xk+1 ∈ A |Xk) P-a.s. (1.1)

1

Page 14: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2 1 Markov Chains: Basic definitions and examples

Condition (1.1) is equivalent to the following condition: for all f ∈ F+(X,X )∪Fb(X,X ),

E [ f (Xk+1) |Fk] = E [ f (Xk+1)|Xk] P-a.s. (1.2)

Let Gk, k∈N be another filtration satisfying for all k∈N, Gk⊂Fk. If (Xk,Fk), k∈N is a Markov chain and Xk, k ∈ N is adapted to the filtration Gk, k ∈ N,then (Xk,Gk), k ∈ N is also a Markov chain. In particular a Markov chain(Xk,Fk), k ∈ N is always a Markov chain with respect to the natural filtrationF X

k , k ∈ N.

Proposition 1.4 Let (Xk,Fk), k ∈ N be a an adapted stochastic process. Thefollowing properties are equivalent:

(i) (Xk,Fk), k ∈ N is a Markov chain,(ii) for each k ∈ N and bounded σ(X j, j ≥ k)-measurable random variable Y ,

E [Y |Fk] = E [Y |Xk] P-a.s. (1.3)

(iii) for each k ∈N, each bounded σ(X j, j ≥ k)-measurable random variable Y andeach bounded σ(X j, j ≤ k)-measurable random variable Z,

E [Y Z|Xk] = E [Y |Xk]E [Z|Xk] P-a.s. (1.4)

Proof.(i)⇒ (ii) The proof is by induction. Consider the property

(Pn): (1.3) holds for all Y = ∏nj=0 g j(Xk+ j) where g j ∈ Fb(X,X ) for all j ≥ 0.

(P0) is true. Assume that (Pn) holds and let g j, j ∈N be a sequence of functionsin Fb(X,X ). The Markov property (1.2) yields

E [g0(Xk) . . .gn(Xk+n)gn+1(Xk+n+1)|Fk]

= E [E [g0(Xk) . . .gn(Xk+n)gn+1(Xk+n+1)|Fk+n]|Fk]

= E [g0(Xk) . . .gn(Xk+n)E [gn+1(Xk+n+1)|Fk+n]|Fk]

= E [g0(Xk) . . .gn(Xk+n)E [gn+1(Xk+n+1)|Xk+n]|Fk] .

Applying the induction assumption (Pn) yields

E [g0(Xk) . . .gn(Xk+n)gn+1(Xk+n+1)|Fk]

= E [g0(Xk) . . .gn(Xk+n)E [gn+1(Xk+n+1)|Xk+n]|Xk]

= E [g0(Xk) . . .gn(Xk+n)E [gn+1(Xk+n+1)|Fk+n]|Xk]

= E [g0(Xk) . . .gn(Xk+n)gn+1(Xk+n+1)|Xk] ,

showing that (Pn+1) holds. Therefore, (Pn) is true for all n ∈ N. Consider the set

H =

Y ∈ σ(X j, j ≥ k) : E [Y |Fk] = E [Y |Xk] P-a.s..

Page 15: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.2 Kernels 3

It is easily seen that H is a vector space. In addition, if Yn, n ∈N is an increasingsequence of nonnegative random variables in H and if Y = limn→∞ Yn is bounded,then by the monotone convergence theorem for conditional expectations,

E [Y |Fk] = limn→∞

E [Yn|Fk] = limn→∞

E [Yn|Xk] = E [Y |Xk] P-a.s.

By Theorem A.22, the space H contains all σ(X j, j ≥ k) measurable random vari-ables.

(ii)⇒ (iii) If Y is a bounded σ(X j, j ≥ k)-measurable random variable and Z bea bounded Fk measurable random variable, an application of (ii) yields

E [Y Z|Fk] = ZE [Y |Fk] = ZE [Y |Xk] P-a.s.

Thus,

E [Y Z|Xk] = E [E [Y Z|Fk]|Xk] = E [ZE [Y |Xk]|Xk]

= E [Z|Xk]E [Y |Xk] P-a.s.

(iii)⇒ (i) If Z is bounded and Fk-measurable, we obtain

E [ f (Xk+1)Z] = E [E [ f (Xk+1)Z|Xk]]

= E [E [ f (Xk+1)|Xk]E [Z|Xk]] = E [E [ f (Xk+1)|Xk]Z] ,

showing (i).2

Heuristically, Condition (1.4) means that the future of a Markov chain is condi-tionnally independent of its past, given its present state.

An important caveat must be made; the Markov property is not hereditary. IfXk, k ∈ N is a Markov chain on X and f is a measurable function from (X,X )to (Y,Y ), then, unless f is one-to-one, f (Xk),k ∈N need not be a Markov chain.In particular, if X is a product space, X= X1×X2 and Xk = (X1,k,X2,k), k ≥ 0, thenthe sequence X1,k,k ≥ 0 may not be a Markov chain.

1.2 Kernels

Definition 1.5 Let (X,X ) and (Y,Y ) be two measurable spaces. A kernel N onX×Y is a mapping N : X×Y → [0,∞] satisfying the following conditions:

(i) for every x ∈ X, the mapping N(x, ·) : A 7→ N(x, A) is a measure on Y ,(ii) for every A ∈ Y , the mapping N(·,A) : x→ N(x,A) is a measurable function

from (X,X ) to [0,∞].

• N is said to be finite if N(x,Y)< ∞ for all x ∈ X.• N is said to be bounded if supx∈X N(x,Y)< ∞.

Page 16: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

4 1 Markov Chains: Basic definitions and examples

• N is called a Markov kernel if N(x,Y) = 1, for all x ∈ X.• N is said to be sub-markovian if N(x,Y)≤ 1, for all x ∈ X.

Example 1.6 (Discrete state-space kernel). Assume that X and Y are countablesets. Each element x ∈ X is then called a state. A kernel N on X×P(Y), whereP(Y) is the set of all parts of Y, is specified by a (possibly doubly infinite) matrixN = (N(x,y) : x,y ∈ X×Y) with nonnegative entries. Each row (N(x,y), y ∈ Y) isa measure on (Y,P(Y)) defined by

N(x,A) = ∑y∈A

N(x,y) ,

for A⊂ Y. The matrix N is said to be Markovian if every row (N(x,y) : y ∈ Y) is aprobability on (Y,P(Y)), i.e. ∑y∈Y N(x,y) = 1 for all x ∈ X. The associated kernelis defined by N(x,y) = N(x,y) for all x,y ∈ X.

Example 1.7 (Measure seen as a kernel). A σ -finite positive measure ν on a space(Y,Y ) can be seen as a kernel on X×Y by defining N(x,A) = ν(A) for all x ∈ Xand A ∈ Y . It is a Markov kernel if and only if ν is a probability measure.

Example 1.8 (Kernel density). Let λ be a positive σ -finite measure on (Y,Y ) andn : X×Y→ R+ be a nonnegative function, measurable with respect to the productσ -field X ⊗Y . Then, the application N defined on X×Y by

N(x,A) =∫

An(x,y)λ (dy) ,

is a kernel. The function n is called the density of the kernel N w.r.t. the measure λ .The kernel N is Markovian if and only if

∫Y n(x,y)λ (dy) = 1 for all x ∈ X.

Let N be a kernel and f ∈ F+(Y,Y ). A function N f : X→ R+ is defined bysetting, for any x ∈ X,

N f : x 7→∫Y

N(x,dy) f (y) .

Proposition 1.9 Let N be a kernel on X×Y . For N f ∈ F+(X,X ) for all f ∈F+(Y,Y ). Moreover, if f is a Markov kernel, then N f ∈ Fb(Y,Y ) for all f ∈Fb(X,X ).

Proof. Assume first that f is a simple nonnegative function, i.e. f = ∑i∈I βi1Bi fora finite collection of nonnegative numbers βi and sets Bi ∈ Y . Then, for x ∈ X,N f (x) = ∑i∈I βiN(x,Bi), and by the property (ii) of Definition 1.5, the function N fis measurable. Let now f ∈ F+(X,X ) and let fn, n ∈ N be a nondecreasing se-quence of measurable nonnegative simple functions such that limn→∞ fn(x) = f (x).Then, by the monotone convergence theorem, for all x ∈ X, it holds that

N f (x) = limn→∞

N fn(x) .

Page 17: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.2 Kernels 5

Therefore, N f is the pointwise limit of a sequence of nonnegative measurable func-tions, hence is measurable.

If N is a Markov kernel and f ∈ Fb(X,X ), then for all y ∈ Y,

N f (y) =∫X

f (x)P(y,dx)≤ | f |∞∫X

P(y,dx) = | f |∞ N(y,X) = | f |∞ .

2

With a slight abuse of notation, we will use the same symbol N for the kernel and theassociated operator N : F+(Y,Y )→ F+(X,X ), f 7→ N f . This operator is additiveand positively homogeneous, i.e. for all f ,g ∈ F+(Y,Y ) and α ∈ R+, it holds thatN( f +g)=N f +Ng and N(α f )=αN f . In addition, if fn, n∈N⊂F+(Y,Y ) is anondecreasing sequence of functions, limn→∞ N fn =N(limn→∞ fn) by the monotoneconvergence theorem. The following lemma shows that every additive and positivehomogeneous function satisfying this property may be associated to a kernel.

Lemma 1.10 Let M : F+(Y,Y )→ F+(X,X ) be an additive and positively homo-geneous function such that limn→∞ M( fn) = M(limn→∞ fn) for every nondecreasingsequence fn, n ∈ N of functions in F+(Y,Y ). Then the function N defined onX×Y by N(x,A) = M(1A)(x) is a kernel and M( f )(x) =

∫Y N(x,dy) f (y) for all for

f ∈ F+(Y,Y ).

Proof. Since M is additive, for each x ∈ X, the function A 7→ N(x,A) is additive.Indeed, for n ∈ N∗, and pairwise disjoints A1, . . . ,An ∈X , we have

N

(x,

n⋃i=1

Ai

)= M

(n

∑i=1

1Ai

)(x) =

n

∑i=1

M(1Ai)(x) =n

∑i=1

N(x,Ai) .

Let Ai, i ∈ N ⊂X be a sequence of pairwise disjoints sets. Then, by additivityand the monotone convergence property of M, we have, for all x ∈ X,

N

(x,

∞⋃i=1

Ai

)= M

(∞

∑i=1

1Ai

)(x) =

∑i=1

M(1Ai)(x) =∞

∑i=1

N(x,Ai) .

This proves that A 7→ N(x,A) is a measure on (X,X ). Thus N is a kernel on X×Y .The last statement follows since any function f ∈ F+(X,X ) is a nondecreasinglimit of simple functions. 2

By defining N f = N f+−N f−, we may extend the application f 7→ N f to allfunctions f of F(Y,Y ) such that N f+ and N f− are not both infinite. We will alsouse the notation N(x, f ) for N f (x) and N(x,1A) or N1A(x) for N(x,A).

Kernels also act on measures. Let µ be a positive measure on (X,X ) and forA ∈ Y , define

µN(A) =∫Y

µ(dx) N(x, A) .

Page 18: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6 1 Markov Chains: Basic definitions and examples

Proposition 1.11 Let N be a kernel on X×Y and µ ∈ M+(X ). Then µN ∈M+(Y ). If N is a Markov kernel, then µ(X) = µN(X).

Proof. Note first that µN(A) ≥ 0 for all A ∈ Y and µN( /0) = 0 since N(x, /0) = 0for all x ∈ X. Therefore, it suffices to establish the countable additivity of µN. LetAi, i∈N ⊂X be a sequence of pairwise disjoints sets. Since N(x, ·) is a measurefor all x ∈ X, the countable additivity implies that N(x,

⋃∞i=1 Ai) = ∑

∞i=1 N(x,Ai).

Moreover, the function x 7→ N(x,Ai) is nonnegative and measurable for all i ∈ N,thus the monotone convergence theorem yields

µN

(∞⋃

i=1

Ai

)=∫

µ(dx)N

(x,

∞⋃i=1

Ai

)=

∑i=1

∫µ(dx)N(x,Ai) =

∑i=1

µN(Ai) .

2

1.2.1 Composition of kernels

Proposition 1.12 (Composition of kernels) Let M and N be two kernels on X×Yand Y×Z . There exists a kernel MN on X×Z , called the composition or theproduct of M and N such that for all x ∈ X and A ∈Z by

MN(x,A) =∫Y

M(x,dy)N(y,A) . (1.5)

Proof. For any A∈Z , y 7→N(y,A) is a measurable function, and by Proposition 1.9,x 7→

∫Y M(x,dy)N(y,A) is a measurable function. For any x∈X, M(x, ·) is a measure

on (Y,Y ) and by Proposition 1.11, A 7→∫

M(x,dy)N(y,A) is a measure on (Z,Z ).Hence, the function (x,A) 7→

∫X M(x,dy)N(y,A) is a kernel on X×Z . 2

Since MN is a kernel on X×Z , for any f ∈F+(Z,Z ), we can define the functionMN f : x 7→MN f (x), which by Proposition 1.9 belongs to F+(X,X ). On the otherhand, N f : y 7→ N f (y) is a function belonging to F+(Y,Y ), and since M is a kernelon X×Y , we may consider the function x 7→M[N f ](x). A natural question to askis whether these two quantities are equal.

Proposition 1.13 Let M be a kernel on X×Y and N be a kernel on Y×Z . Then,for each x ∈ X, and f ∈ F+(Z,Z ),

MN f (x) = M[N f ](x) . (1.6)

Proof. Let f = ∑ni=1 αi1Ai , Ai ∈Z , be a simple function. Then,

MN f (x) =n

∑i=1

αiMN(x,Ai) =∫Y

M(x,dy)n

∑i=1

αiN(y,Ai) = M[N f ](x) . (1.7)

Page 19: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.2 Kernels 7

Let f ∈ F+(Z,Z ). The function f is the pointwise limit of a nondecreasing se-quence of nonnegative simple functions fn, n ∈ N. Note that the sequence offunctions N fn, n ∈ N is also nondecreasing, and by the monotone convergencetheorem, limn→∞ N fn(x) = N f (x). Therefore, applying again the monotone conver-gence theorem and (1.7), we have

MN f (x) = limn→∞

MN fn(x) = limn→∞

M[N fn](x) = M[N f ](x) .

2

Given a Markov kernel N on X×X , we may define the n-th power of this kerneliteratively. For x ∈ X and A ∈X , we set N0(x, A) = δx(A) and for n≥ 1, we defineinductively Nn by

Nn(x,A) =∫X

N(x,dy)Nn−1(y,A) . (1.8)

For integers k,n≥ 0 this yields the Chapman-Kolmogorov equations.

Nn+k(x,A) =∫X

Nn(x,dy)Nk(y,A) . (1.9)

In the case of a discrete state space X, a kernel N can be seen as a matrix with nonnegative entries indexed by X. Then the k-th power of the kernel Nk defined in (1.8)is simply the k-th power of the matrix N.

1.2.2 Tensor products of kernels

Proposition 1.14 Let M be a kernel on X×Y and N be a kernel on Y×Z . Then,there exists a kernel M⊗N on X× (Y ⊗Z ), called the tensor product of M and N,such that, for all f ∈ F+(Y×Z,Y ⊗Z ),

M⊗N f (x) =∫Y

M(x,dy)∫Z

f (y,z)N(y,dz) . (1.10)

Proof. Define the mapping I : F+(Y⊗Z,Y ⊗Z )→ F+(X,X ) by

I f (x) =∫Y

M(x,dy)∫Z

f (y,z)N(y,dz) .

The mapping I is an additive positively homogeneous application. In addition, forany nondecreasing sequence fn, n ∈ N, we have I[limn→∞ fn] = limn→∞ I( fn) bythe monotone convergence Theorem. Therefore, Lemma 1.10 shows that (1.10) de-fines a kernel on X× (Y ⊗Z ). 2

Remark 1.15 If ν is a σ -finite positive measure on (X,X ) and N is a kernel onX×Y , then we can also define the tensor product of ν and N, noted ν⊗N, whichis a measure on (X×Y,X ⊗Y ) defined by

Page 20: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

8 1 Markov Chains: Basic definitions and examples

ν⊗N(A×B) =∫

Aν(dx)N(x,B) .

Lemma 1.16 Let (X,X ), (Y,Y ), and (Z,Z ) be measurable spaces. Let M be akernel on X×Y and N be a kernel on Y×Z .

• If M and N are both finite (resp. bounded) kernels, then M⊗N is finite (resp.bounded) kernel.

• If M and N are both Markov kernels, then M⊗N is a Markov kernel.• If (U,U ) is a measurable space and P is a kernel on Z×U , then (M⊗N)⊗P=

M⊗ (N⊗P), i.e. the tensor product of kernels is associative.

The proof is omitted. For n≥ 1, the n-th tensorial power P⊗n of a kernel P on X×Yis the kernel on (X,X ⊗n) defined by

P⊗n f (x) =∫Xn

f (x1, . . . ,xn)P(x,dx1)P(x1,dx2) · · ·P(xn−1,dxn) . (1.11)

1.2.3 Sampled kernel and resolvent kernel

Definition 1.17 (Sampled kernel, m-skeleton, resolvent kernel) Let a be a prob-ability measure on N, that is a sequence a(n), n ∈ N such that a(n)≥ 0 for all nand ∑

∞k=0 a(n) = 1. Let P be a Markov kernel on (X,X ). The sampled kernel Ka is

defined by

Ka(x,A) =∞

∑n=0

a(n)Pn(x,A) , x ∈ X, A ∈X . (1.12)

(i) If a = δm for an integer m ∈ N, then Kδm = Pm is the kernel of the so-calledm-skeleton.

(ii) If aε is the geometric distribution with mean 1/ε , ε > 0, i.e.

aε(n) = (1− ε)εn , n ∈ N , (1.13)

then Kaεis referred to as the resolvent kernel.

If a and b are two sequences of real numbers, then a∗b is the convolution of a andb defined, for k ∈ N by

a∗b(n) =n

∑k=0

a(k)b(n− k) .

Lemma 1.18 If a and b are probability measures on N, then the sampled kernelsKa and Kb satisfy the generalized Chapman-Kolmogorov equations

Ka∗b(x,A) =∫

Ka(x,dy)Kb(y,A) (1.14)

Page 21: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.3 Homogeneous Markov chains 9

Proof. Applying the definition of the sampled kernel and the Chapman-Kolmogorovequation (1.9) yields (note that all the terms in the sum below are nonnegative)

Ka∗b(x,A) =∞

∑n=0

Pn(x,A)a∗b(n) =∞

∑n=0

Pn(x,A)n

∑m=0

a(m)b(n−m)

=∞

∑n=0

n

∑m=0

∫Pm(x,dy)Pn−m(y,A)a(m)b(n−m)

=∫ ∞

∑m=0

Pm(x,dy)a(m)∞

∑n=m

Pn−m(y,A)b(n−m) =∫

Ka(x,dy)Kb(y,A) .

2

1.3 Homogeneous Markov chains

Definition 1.19 (Homogeneous Markov Chain) Let (X,X ) be a measurable space.Let ν be a probability measure on (X,X ) and let P be a Markov kernel on X×X .Let (Ω ,F ,Fk, k ∈ N,P) be a filtered probability space. An adapted stochas-tic process (Xk,Fk), k ∈ N is called a homogeneous Markov chain with Markovkernel P and initial distribution ν if, for all k ≥ 0 and A ∈X ,

(i) P(X0 ∈ A) = ν(A),(ii) P(Xk+1 ∈ A |Fk) = P(Xk,A) P-a.s.

Remark 1.20 Condition (ii) is equivalent to E [ f (Xk+1)|Fk] = P f (Xk) P-a.s. forall f ∈ F+(X,X )∪Fb(X,X ).

Remark 1.21 Assume that (Xk,Fk), k ∈ N is a homogeneous Markov chain.Then, (Xk,F

Xk ), k ∈ N is also a homogeneous Markov chain. Unless specified

otherwise, we will always consider the natural filtration and we will simply writethat Xk, k ∈ N is a homogeneous Markov chain.

Proposition 1.22 Let P be a Markov kernel on (X,X ) and ν be a probability mea-sure on (X,X ). An X-valued stochastic process Xk, k ∈ N is a homogeneousMarkov chain with kernel P and initial distribution ν if and only if, for all k ∈ N,the distribution (X0, . . . ,Xk) is ν⊗P⊗k.

Applying (1.15) with a function f which depends only on the last coordinate yieldsthat the distribution of Xk is νPk.

Proof (of Proposition 1.22). Fix k ≥ 0. Let Hk be the vector space of measurablefunctions f ∈ Fb(X

⊗(k+1)) such that

E [ f (X0, . . . ,Xk)] = ν⊗P⊗k( f ) . (1.15)

Page 22: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

10 1 Markov Chains: Basic definitions and examples

Let fn, n∈N be an increasing sequence of nonnegative functions in Hk such thatlimn→∞ fn = f with f bounded. By the monotone convergence theorem, f belongsto Hk. By Theorem A.21, the proof will be concluded if we moreover check thatHk contains the functions of the form

f0(x0) · · · fk(xk) , f0, . . . , fk ∈ Fb(X,X ) . (1.16)

We prove this by induction. For k = 0, (1.15) reduces to E [ f0(X0)] = ν( f0), whichmeans that ν is the distribution of X0. For k ≥ 1, assume that (1.15) holds for k−1and f of the form (1.16). Then,

E

[k

∏j=0

f j(X j)

]= E

[k−1

∏j=0

f j(X j)E [ fk(Xk)|Fk−1]

]

= E

[k−1

∏j=0

f j(X j)P fk(Xk−1)

]= ν⊗P⊗(n−1)( f0⊗·· ·⊗ fk−1P fk)

= ν⊗P⊗n( f0⊗·· ·⊗ fk) .

The last equality holds since P( f Pg) = P⊗P( f ⊗g). This concludes the inductionand the direct part of the proof.

Conversely, assume that (1.15) holds. This obviously implies that ν is the distri-bution of X0. We must prove that, for each k ≥ 1, f ∈ F+(X,X ) and each F X

k−1-measurable random variable Y :

E [ f (Xk)Y ] = E [P f (Xk−1)Y ] . (1.17)

Let Gk be the set of F Xk−1-measurable random variables Y satisfying (1.17). Gk is a

vector space and if Yn, n ∈ N is an increasing sequence of nonnegative randomvariables such that Y = limn→∞ Yn is bounded, then Y ∈ Gk by the monotone conver-gence Theorem. The property (1.15) implies (1.17) for Y = f0(X0) f1(X1) . . . fk−1(Xk−1)where for j ≥ 0, f j ∈ Fb(X,X ). The proof is concluded as previously by applyingTheorem A.22. 2

Under weak conditions on the structure of the state space X, every homogeneousMarkov chain Xk, k ∈ N with value in X may be represented as a functional au-toregressive process, i.e., Xk+1 = f (Xk,Zk+1) where Zk, k∈N is an i.i.d. sequenceof random variables with values in a measurable space (Z,Z ), X0 is independent ofZk, k ∈ N and f is a measurable function from (X×Z,X ⊗Z ) into (X,X ).

This can be easily proved for a real valued Markov chain Xk, k ∈N with initialdistribution ν and Markov kernel P. Let X be a real-valued random variable andlet F(x) = P(X ≤ x) be the cumulative distribution function of X . Let F−1 be thequantile function, defined as the generalized inverse of F by

F−1(u) = infx ∈ R : F(x)≥ u . (1.18)

Page 23: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.4 The Canonical Chain 11

The right continuity of F implies that u ≤ F(x) ⇔ F−1(u) ≤ x. Therefore, if Zis uniformly distributed on [0,1], F−1(Z) has the same distribution as X , sinceP(F−1(Z)≤ t) = P(Z ≤ F(t)) = F(t) = P(X ≤ t).

Define F0(t) = ν((−∞, t]) and g = F−10 . Consider the function F from R×R to

[0,1] defined by F(x,x′) = P(x,(−∞,x′]). Then, for each x ∈ R, F(x, ·) is a cumu-lative distribution function. Let the associated quantile function f (x, ·) be definedby

f (x,u) = inf

x′ ∈ R : F(x,x′)≥ u. (1.19)

The function (x,u) 7→ f (x,u) is Borel measurable since (x,x′) 7→ F(x,x′) is itself aBorel measurable function. If Z is uniformly distributed on [0,1], then, for all x ∈Rand A ∈B(R), we have

P( f (x,Z) ∈ A) = P(x,A) .

Let Zk, k ∈ N be a sequence of i.i.d. random variables, uniformly distributed on[0,1]. Define a sequence of random variables Xk, k ∈ N by X0 = g(Z0) and fork ≥ 0,

Xk+1 = f (Xk,Zk+1) .

Then, Xk, k∈N is a Markov chain with Markov kernel P and initial distribution ν .We state without proof a general result for reference only since it will not needed

in the sequel.

Theorem 1.23. Let (X,X ) be a measurable space and assume that X is count-ably generated. Let P be a Markov kernel and ν be a probability on (X,X ). LetZk, k ∈N be a sequence of i.i.d. random variables uniformly distributed on [0,1].There exists a measurable application g from ([0,1] ,B([0,1])) to (X,X ) and ameasurable application f from (X× [0,1] ,X ⊗B([0,1])) to (X,X ) such that thesequence Xk, k ∈ N defined by X0 = g(Z0) and Xk+1 = f (Xk,Zk+1) for k ≥ 0, is aMarkov chain with initial distribution ν and Markov kernel P.

From now on, we will almost uniquely deal with homogeneous Markov chain andwe will, for simplicity, omit to mention homogeneous in the statements.

1.4 The Canonical Chain

In this section, we show that, given an initial distribution ν ∈M1(X ) and a Markovkernel P on X×X , we can construct a Markov chain with initial distribution ν

and transition kernel P on a specific filtered probability space, referred to as thecanonical space. The following construction remains valid for general measurablespaces (X,X ).

Definition 1.24 (Coordinate process) Let XN be the set of X-valued sequencesω = (ω0,ω1,ω2, . . .) endowed with the product σ -field X ⊗N. The coordinate pro-cess Xk, k ∈ N is defined by

Page 24: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

12 1 Markov Chains: Basic definitions and examples

Xk(ω) = ωk , ω ∈ XN . (1.20)

Theorem 1.25. Let P be a Markov kernel P on X×X and ν ∈M1(X ). Then, thereexists a unique probability Pν on (XN,X ⊗N) such that the coordinate process is aMarkov chain with initial distribution ν and kernel P.

Proof. Set, for k ∈ N and f ∈ F+(Xk+1,X ⊗(k+1)),

µk( f ) = ν⊗P⊗k( f ) .

By Lemma 1.16, µk ∈M1(X⊗(k+1)) and the family of finite dimensional distribu-

tions µk, k∈N satisfies the consistency condition (A.8). We conclude by applyingTheorem A.31. 2

The expectation associated to Pν will be denoted by Eν and for x ∈ X, Px and Exwill be shorthand for Pδx and Eδx .

Proposition 1.26 For all A ∈X ⊗N,

(i) the function x 7→ Px(A) is X -measurable,(ii) for all ν ∈M1(X ), Pν(A) =

∫XPx(A)ν(dx).

Proof. Let M be the set of those A ∈X ⊗N satisfying (i) and (ii). The set M isa monotone class and contains all the sets of the form ∏

ni=1 Ai, Ai ∈X , n ∈ N, by

(1.15). Hence, Theorem A.19 shows that M = X ⊗N. 2

Definition 1.27 (Canonical Markov Chain) The canonical Markov chain with ker-nel P on X×X is the coordinate process Xn, n ∈ N on the canonical filteredspace (XN,X ⊗N,F X

k , k ∈ N) endowed with the family of probability measuresPν ,ν ∈M1(X ) given by Theorem 1.25.

In the sequel, unless explicitely stated otherwise, a Markov chain with kernel Pon (X,X ) will refer to the canonical chain on (XN,X ⊗N).

One must be aware that with the canonical Markov chain on the canonicalspace XN comes an infinite family of probability measures, indexed by the set ofprobability measures on X. A property might be almost surely true with respect toone such probability measure Pµ and almost surely wrong with respect to anotherone Pν .

Definition 1.28 A property is true P∗-a.s. if it is almost surely true with respect toPν for all initial distribution ν .

Proposition 1.26 shows that the probabilities Px play a particular role and that aproperty is almost surely true P∗ if and only if it is almost surely true Px-a.s. for allx ∈ X.

Page 25: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.5 Invariant Measures and Stationarity 13

1.5 Invariant Measures and Stationarity

Definition 1.29 (Invariant measure) Let P be a Markov kernel on (X,X ). A nonzero σ -finite positive or signed measure µ ∈M+(X )∪Ms(X ) is said to be invari-ant with respect to P (or P-invariant) if µ is σ -finite and µ = µP.

If an invariant measure is finite, it may be normalized to an invariant probabilitymeasure. The fundamental role of an invariant probability measure is illustratedby the following result. Recall that a stochastic process Xk, k ∈ N defined on aprobability space (Ω ,F ,P) is said to be stationary if, for any integers k, p≥ 0, thedistribution of the random vector (Xk, . . . ,Xk+p) does not depend on k.

Theorem 1.30. Let (Ω ,F ,P) be a probability space and let P be a Markov kernelon a measurable space (X,X ). A Markov chain Xk, k ∈ N defined on (Ω ,F ,P)with kernel P is a stationary process if and only if its initial distribution is invariantwith respect to P.

Proof. If the chain Xk is stationary, then the marginal distribution is constant. Inparticular, the distribution of X1 is equal to the distribution of X0, which preciselymeans that πP = π . Thus π is invariant. Conversely, if πP = π , then πPh = π forall h ≥ 1. Then, for all integers h and n, by Proposition 1.22, the distribution of(Xh, . . . ,Xn) is πPh⊗P⊗n. Since πPh = π , it does not depend on h. 2

If a Markov kernel P admits an invariant measure π , then P can be seen asa weakly contracting (or 1-Lipschitz) operator on the space Lp(π) of measurablefunctions such that π(| f |p)< ∞.

Proposition 1.31 Let P be a Markov kernel on (X,X ) with invariant distributionπ . For every p ∈ [1,∞], P is a Lp(π) contraction, i.e. for all f ∈ Lp(π), P f ∈ Lp(π)and π(|P f |p)≤ π(| f |p).

Proof. Let f ∈ F+(X,X ) be such that π( f p)< ∞. By Jensen’s inequality, (P f )p ≤P( f p). Hence,

π(P fp)≤ πP( f p)= (πP) f p = π( f p) .

2

For a finite signed measure ξ on (X,X ), we denote by ξ+ and ξ− the positiveand negative variations of ξ (see Definition 6.2). Recall that, for any A∈X , ξ (A) =ξ+(A)−ξ−(A), and that ξ+ and ξ− are mutually singular.

Lemma 1.32 Let P be a Markov kernel and λ be an invariant signed measure. Thenλ+ is also invariant.

Proof. For any B ∈X ,

Page 26: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

14 1 Markov Chains: Basic definitions and examples

λ+P(B)≥ λ

+P(B∩S) =∫

P(x,B∩S)λ+(dx)

≥∫

P(x,B∩S)λ (dx) = λ (B∩S) = λ+(B) .

Since λ+P(X) = λ+(X), the latter inequality implies that λ+P = λ+. 2

Definition 1.33 (Absorbing set) A set B∈X is called absorbing if P(x,B) = 1 forall x ∈ B.

In different words, the set B is “programmed to receive, but you can never leave”.

Theorem 1.34. Let P be a Markov kernel on (X,X ). Then:

(i) The set of invariant probability measures for P is a convex subset of the convexcone M+(X ).

(ii) For any two distinct invariant probability measures π,π ′ for P, the finite mea-sures ϕ := (π−π ′)+ and ψ := (π−π ′)− are non-trivial, mutually singular andinvariant for P.

(iii) Let π be an invariant probability and X1 ⊂ X with π(X1) = 1. There existsB⊂ X1 such that π(B) = 1 and P(x,B) = 1 for all x ∈ B (i.e. B is absorbing forP).

Proof.(i) P is a linear operator on M+(X ). Therefore, if π , π ′ are two invariant prob-

ability measures for P, then for every scalar a ∈ [0,1], using first the linearity andthen the invariance,

(aπ +(1−a)π ′)P = aπP+(1−a)π ′P = aπ +(1−a)π ′ .

(ii) We apply Lemma 1.32 to the signed measure λ = π − π ′. The measuresφ = (π−π ′)+ and ψ = (π−π ′)− are singular, invariant and non trivial since

(µ−ν)+(X) = (µ−ν)−(X) =12|µ−ν |(X) .

(iii) The invariance of π implies that

π(X1) = 1 =∫X1

P(x, X1)π(dx) .

Therefore, there exists a set X2 ∈X such that,

X2 ⊂ X1 , π(X2) = 1 and P(x, X1) = 1 , for all x ∈ X2.

Repeating the above argument, we obtain a nonincreasing sequence Xi, i ≥ 1 ofsets Xi ∈ X such that π(Xi) = 1 for all i = 1,2, . . ., and P(x, Xi) = 1, for allx ∈ Xi+1. Define B :=

⋂∞i=1Xi ∈X . The set B is nonempty because

Page 27: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.6 Reversibility 15

π(B) = π

(∞⋂

i=1

Xi

)= lim

i→∞π(Xi) = 1 .

The set B is absorbing for P because for any x ∈ B,

P(x, B) = P

(x,

∞⋂i=1

Xi

)= lim

i→∞P(x, Xi) = 1.

2

In general, there may exist more than one invariant measure, or none if X is notfinite. As a trivial example, consider X= N and P(x,x+1) = 1.

1.6 Reversibility

Definition 1.35 Let P be a Markov kernel on (X,X ). A σ -finite measure ξ on Xis said to be reversible with respect to P if for all (A,B) ∈X ×X

ξ ⊗P(A×B) = ξ ⊗P(B×A) . (1.21)

Equivalently, revesibility means that for all bounded measurable functions f definedon (X×X,X ⊗X ),∫∫

X×Xξ (dx)P(x,dx′) f (x,x′) =

∫∫X×X

ξ (dx)P(x,dx′) f (x′,x) . (1.22)

If X is a denumerable state-space, a (finite or σ -finite) measure ξ is reversible w.r.t.to P if and only if, for all (x,x′) ∈ X×X,

ξ (x)P(x,x′) = ξ (x′)P(x′,x) . (1.23)

This condition is referred to as the detailed balance condition.If Xk, k ∈ N is a Markov chain with kernel P and initial distribution ξ , the

reversibility condition (1.21) precisely means that (X0,X1) and (X1,X0) have thesame distribution. This implies in particular that the distribution of X1 is the same asthat of X0, and this means that ξ is P-invariant. Thus we see that reversibility impliesinvariance. This property can be extended to all finite dimensional distributions.

Proposition 1.36 Let P be a Markov kernel on (X,X ) and ξ ∈M1(X ). If ξ isreversible w.r.t. P, then

(i) ξ is P-invariant(ii) the process Xk, k ∈ N is reversible, i.e. for any p ∈ N, (X0, . . . ,Xp) and

(Xp, . . . ,X0) have the same distribution.

Proof. (i) Using (1.21) with A = X and B ∈X , we get

Page 28: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

16 1 Markov Chains: Basic definitions and examples

ξ P(B) = ξ ⊗P(X×B) = ξ ⊗P(B×X) =∫

ξ (dx)1B(x)P(x,X) = ξ (B) .

(ii) It suffices to prove that, for every p≥ 1 and f ∈ F+(Xp,X ⊗p),

Eξ [ f (X0, . . . ,Xp)] = Eξ [ f (Xp, . . . ,X0)] . (1.24)

We prove (1.24) by induction on p. For p= 1, the induction hypothesis follows fromthe (1.22):

Eξ [ f (X0,X1)] =∫∫

ξ (dx0)P(x0,dx1) f (x0,x1)

=∫∫

ξ (dx0)P(x0,dx1) f (x1,x0) = Eξ [ f (X1,X0)] .

Assume that it holds for some p≥ 1. Define the measurable function g on Xp+1 by

g(x0, . . . ,xp) =∫X

P(xp,dxp+1) f (x0, . . . ,xp,xp+1) .

By the Markov property and the induction assumption, we get

Eξ [ f (X0, . . . ,Xp+1)] = Eξ [Eξ

[f (X0, . . . ,Xp+1)

∣∣Fp]]

= Eξ [g(X0, . . . ,Xp)] = Eξ [g(Xp, . . . ,X0)] .

Since ξ is an invariant distribution, we get Eξ [ f (X0, . . . ,Xp+1)]=Eξ [g(X1, . . . ,Xp+1)].Since

g(xp+1, . . . ,x1) =∫

f (xp+1, . . . ,x1,x0)P(x1,dx0)

the previous relation yields, using that ξ (d0)P(x0,dx1) = ξ (dx1)P(x1,dx0),

Eξ [ f (X0, . . . ,Xp+1)] =∫· · ·∫

ξ (dx1)P(x1,dx0)p+1

∏i=2

P(xi−1,dxi) f (xp+1, . . . ,x1,x0)

=∫

ξ (dx0)P(x0,dx1)p+1

∏i=2

P(xi−1,dxi) f (xp+1, . . . ,x1,x0)

= Eξ [ f (Xp+1, . . . ,X0)] ,

which establishes (1.24).2

1.7 Topological properties of kernels

So far, we have considered Markov kernels on abstract state spaces without anytopological structrure. In the overwhelming majority of examples, the state space

Page 29: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.7 Topological properties of kernels 17

will be a metric space (X,d) endowed with its Borel σ -field, and we will take advan-tage of this structure to obtain results on existence, uniqueness of and convergenceto the invariant measure. In this section we state topological properties of kernelswhich will be used in the following chapters and explored in much greater details inChapter 10.

Recall that a sequence of probability measures µn, n ∈ N on a metric space(X,d) is said to converge weakly to a probability measure µ (which we denote µn

w⇒µ) if limn→∞ µn( f ) = µ( f ) for all functions f ∈ Cb(X), the space of real-valuedbounded continuous functions on X. The space Cb(X) endowed with the topologyof uniform convergence is a Banach space. We have already seen in Proposition 1.9that a Markov kernel P maps bounded functions onto bounded functions. This is notnecessarily the case for continuous functions. This property must be assumed.

Definition 1.37 (Feller kernel) A Markov kernel P on a metric space (X,d) iscalled a Feller kernel if P f ∈ Cb(X) for all f ∈ Cb(X).

Equivalently, the kernel P is Feller if for every sequence xn, n ∈ N in X such thatlimn→∞ xn = x, the sequence of probability measures P(xn, ·),n ∈ N convergesweakly to P(x, ·), i.e. δxnP w⇒ δxP or for all f ∈ Cb(X), limn→∞ P f (xn) = P f (x). Ifthe kernel P is Feller then Pn is also a Feller kernel for all n ∈ N and the samplekernel Ka is also Feller for any probability a ∈M1(N).

We have seen that any Markov chain can be expressed as a random iterativesystem. If the link function has some smoothness property, then it defines a Fellerkernel.

Proposition 1.38 Let (X,d) be a metric space, (Z,Z ) be a measurable space, µ

be a probability measure on Z , Z a random variable with distribution µ and letF : X×Z→ X be a measurable function. Let P be the Markov kernel associated tothe function F and the measure µ , defined for x ∈ X and A ∈ σ by

P(x,A) = P(F(x,Z) ∈ A) =∫1A(F(x,z))µ(dz) .

If F is continuous with respect to x for µ almost all z ∈ Z, then P is a Feller kernel.

Proof. Let f ∈Cb(X) and x ∈X. By assumption, the function f (F(x,z)) is boundedand continuous with respect to x for µ-almost all z. By Lebesgue’s dominated con-veergence theorem, this implies that P f (x) = E [ f (F(x,Z))] is continuous. 2

Proposition 1.39 Let P be a Feller kernel on a complete metric space (X,d). ThenP is a bounded linear operator on Cb(X) endowed with the supremum norm | · |∞and a weakly continuous operator on M1(X ).

Proof. For f ∈ Cb(X) and x ∈ X,

|P f (x)| ≤∫X| f (y)|P(x,dy)≤ | f |∞P(x,X) = | f |∞ .

Page 30: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

18 1 Markov Chains: Basic definitions and examples

This proves the first statement. Let µn, n ∈ N be a sequence of probability mea-sures on (X,X ) such that µn converges weakly to a probability measure µ . Then,for any f ∈ Cb(X), since P f is also in Cb(X), we have

limn→∞

(µn P)( f ) = limn→∞

µn(P f ) = µ(P f ) = (µP)( f ) .

This proves that the sequence µn P, n ∈ N converges weakly to µP. 2

The weak continuity of a Feller kernel P seen as an operator acting on probabilitymeasures has many important consequences.

Proposition 1.40 Let P be a Feller kernel on a metric space (X,d). For µ ∈M1(X ), define the sequence of probability measures µn = µPn, n ≥ 1. If µn

w⇒ π

where π ∈M1(X ), then π is P-invariant.

Proof. By the weak continuity of P, we have for all f ∈ Cb(X),

πP f = limn→∞

(µPn)P f = limn→∞

µn+1 f = π f .

Thus πP and π take equal values on all bounded continuous functions and are there-fore equal by Theorem B.4. 2

The assumption can be weakened. All the weak limit points of the Cesaro meansn−1

∑n−1k=0 µPk are P-invariant.

Proposition 1.41 Let P be a Feller kernel on a metric space (X,d). For µ ∈M1(X ), define the sequence πµ

n , n ∈ N of probability measures

πµn = n−1

n−1

∑k=0

µPk . (1.25)

All the weak-limit points of πµn , n ∈ N are P-invariant.

Proof. Let π be a weak-limit point of πµn , n ∈ N. There exists a subsequence

πµnk ,k ∈ N which converges weakly to π ∈ M1(X ). Since P is Feller, for any

f ∈ Cb(X), P f ∈ Cb(X). Thus,

|πP( f )−π( f )|= |π(P f )−π( f )|= limk→∞|πµ

nk(P f )−π

µnk( f )|

= limk→∞

1nk|µPnk( f )−µ( f )| ≤ lim

k→∞

2| f |∞nk

= 0 .

This proves that π = πP by invoking Theorem B.4 as previously. 2

The previous results provides a method to establish the existence of an invariantmeasure. The uniqueness of this invariant measure is more difficult to establish ingeneral. We will provide in later chapters different criteria to check that the invariantdistribution is unique. The following elementary result proves, under rather strongassumptions, the existence and uniqueness of an invariant measure.

Page 31: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.8 Markov chains of order p 19

Proposition 1.42 Let P be a Markov kernel on a metric space (X,d), P be a Fellerkernel and π ∈M1(X ). Assume that, for all x ∈ X, the sequence δxPn,n ∈ Nconverges weakly to π . Then, π is the unique P-invariant probability and for anyξ ∈M1(X ) and ξ Pn w⇒ π .

Proof. Proposition 1.40 shows that π is invariant. Let f ∈ Cb(X). For all x ∈ X,Pn f (x) = δxPn( f )→n→∞ π( f ) and |Pn f (x)| ≤ | f |∞. Therefore, for any probabilitymeasure ξ , the bounded convergence theorem yields

limn→∞

ξ Pn( f ) = limn→∞

∫Pn f (x)ξ (dx) = π( f ) . (1.26)

This proves that ξ Pn w⇒ π .Let π ′ be an invariant probability. Then π ′Pn = π ′ for all n ∈ N and by (1.26)

applied with ξ = π ′, we know that π ′Pn w⇒ π hence π ′ = π . 2

1.8 Markov chains of order p

There are many examples of stochastic processes which are not Markov chains butwhich can easily be embedded in a Markov chain.

Definition 1.43 (Markov Chain of order p) Let p ≥ 1 be an integer. Let (X,X )be a measurable space. Let (Ω ,F ,Fk, k ∈ N,P) be a filtered probability space.An adapted stochastic process (Xk,Fk), k∈N is called a Markov chain of order pif the process (Xk, . . . ,Xk+p−1),k ∈ N is a Markov chain with values in Xp.

Let Xk, k ∈N be a Markov chain of order p≥ 2 and let Kp be the kernel of thechain Xk, k ∈ N with Xk = (Xk, . . . ,Xk+p−1), that is

P(

X1 ∈ A1×·· ·×Ap∣∣X0 = (x0, . . . ,xp−1)

)= Kp((x0, . . . ,xp−1),A1×·· ·×Ap) .

Since X0 and X1 have p− 1 common components, the kernel Kp has a particularform. More precisely, defining the kernel K on Xp×X by

K(x0, . . . ,xp−1,A) = Kp((x0, . . . ,xp−1),Xp−1×A)

= P(

Xp ∈ A∣∣X0 = x0, . . . ,Xp−1 = xp−1

),

we obtain that

Kp(x0, . . . ,xp−1),A1×·· ·×Ap) = δx1(A1) · · ·δxp−1(Ap−1)K((x0, . . . ,xp−1),Ap) .

We thus see that an equivalent definition of a homogeneous Markov chain of orderp is the existence of a kernel K on Xp×X such that for all n≥ 0,

E[

Xn+p ∈ A∣∣F X

n+p−1]= K((x0, . . . ,xp−1),A) .

Page 32: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

20 1 Markov Chains: Basic definitions and examples

Similarly to Theorem 1.23, if (X,X ) is countably generated, a Markov chain oforder p can be expressed as a functional autoregressive process of order p, i.e. thereexists an i.i.d. sequence Zk, k ∈ N of random variables uniformly distributed on[0,1] and a measurable function F defined on Xp× [0,1] such that

Xk+p+1 = F(Xk+1, . . . ,Xp,Zk+p+1) .

1.9 Discrete State Space Examples

1.9.1 Random walks on Zd

Let Zn, n ∈ N∗ be a sequence of i.i.d. random variables with values in Zd anddistribution ν . Let X0 be a random variable in Zd independent of Zn, n ∈ N∗. Arandom walk with jump distribution ν is a process Xk, k ∈ N defined by X0 andthe recursion

Xn+1 = Xn +Zn+1 .

This is a Markov chain with kernel P defined on Zd×Zd by

P(x,y) = ν(y− x) , x,y ∈ Zd .

The random walk is spatially homogeneous in the sense that P(x,y) = P(0,y−x) =ν(y− x), showing that the kernel is determined by the jump distribution ν . If thedistribution ν is symmetric ν( j) = ν(− j), the random walk is said to be symmet-ric. The so-called simple random walk constitutes a particularly important class ofrandom walks. Denote by ‖x‖ the Euclidean norm of x ∈ Zd from the origin. ThenP defines the d-dimensional simple random walk if P(x,y) = 1/(2d) if ‖y− x‖ = 1and P(x,y) = 0 otherwise. When d = 1, we define the Bernoulli random walk withkernel

P(x,x+1) = p , P(x,x−1) = q , p≥ 0 , q≥ 0, p+q = 1 .

For n ∈ N, Pn(0,x) is the probability of an n-step transition from 0 to x (the prob-ability that a ”particle”, starting at zero, finds itself after n iterations at x). Supposethat n and x are both even or odd and that ‖x‖ ≤ n (otherwise Pn(0,x) = 0). ThenPn(0,x) is the probability of 1/2(x+n) successes in n independent Bernoulli trials,where the probability of success is p. Therefore

Pn(0,x) = p(n+x)/2q(n−x)/2(

n(n+ x)/2

),

where the sum n+ x is even and ‖x‖ ≤ n and Pn(0,x) = 0 otherwise.

Page 33: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.9 Discrete State Space Examples 21

1.9.2 Reflected and absorbed random walk on N

If the simple random walk is restricted to the nonnegative integers, then the zerostate is called a barrier. If P(0,1)= 1, then it is called a reflecting barrier. If P(0,0)=1 then 0 is called an absorbing barrier. Once the particle reaches zero it remains thereforever. If P(0,1)> 0 and P(0,0)> 0, then 0 is called a partially reflecting barrier.

If the simple random walk is restricted to a finite number of states, say 0,1,2, . . . ,a,then both the states 0 and a may be reflecting, absorbing, or partially reflecting bar-riers.

A simple random walk on Z can be used to describe the fortune of a gamblerengaged in a series of games whose outcome is winning or losing one unit. In thatcase P(x,x) = 0. If the player cannot be indebted or borrow money, then the state 0 isan absorbing barrier and this event is called the gambler’s ruin. A reflecting barrierat a > 0 means that the gambler will cash in all their gain above a and an absorbingbarrier at a means that the gambler will stop playing as soon as their fortune reachesthe level a.

1.9.3 Level-dependent quasi Birth-and-death process

A level-dependent quasi birth-and-death process is a Markov chain on the finite orinfinite subset X= a, a+1, . . . , b (−∞≤ a < b≤ ∞) of Z with kernel P definedby

P(x,x+1) = px , P(x,x−1) = qx , P(x,x) = rx ,

with px +qx + rx = 1 for all x ∈ X.This process can be used to describe the position of a particle moving on a grid,

which at each step may only remain at the same state or move to an adjacent statewith a probability possibly depending on the state.

If P(0,0) = 1 and px + qx = 1 for x > 0, this process may be considered as amodel for the size of a population, recorded each time it changes, px being theprobability that a birth occurs before a death when the size of the population is x.Birth-and-death have many applications in demography, queueing theory, perfor-mance engineering or biology. They may be used to study the size of a population,the number of diseases within a population or the number of customers waiting in aqueue for a service.

1.9.4 Ehrenfest’s Urn

This model, also called the dog-flea model, is a Markov chain on a finite state space0, . . . ,N where N > 1 is a fixed integer. Balls (or particles) numbered 1 to N are

Page 34: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

22 1 Markov Chains: Basic definitions and examples

divided among two urns A and B. At each step, an integer i is drawn at random andthe ball numbered i is moved to the other urn. The number Xn of balls at time n is aMarkov chain on 0, . . . ,N with transition matrix P defined by

P(i, i+1) =N− i

N, i = 0, . . . ,N−1 ,

P(i, i−1) =iN

, i = 1, . . . ,N .

The states 0 and N are reflecting barriers. The Binomial distribution B(N,1/2) isreversible with respect to the kernel P. Indeed, for all i = 0, . . . ,N−1,(

Ni

)N− i

N=

N!(N− i)i!(N− i)!N

=

(N

i+1

)i+1

N.

This is the detailed balance condition of Definition 1.35. Thus the binomial distri-bution B(N,1/2) is invariant.

For n≥ 1,

E [Xn |Xn−1] = (Xn−1 +1)N−Xn

N+(Xn−1−1)

Xn

N= Xn−1(1−2/N)+1 .

Set mn(x) = Ex[Xn] for x ∈ 0, . . . ,N and a = 1−2/N, this yields

mn(x) = amn−1(x)+1 .

The solution of this recurrence equation is

mn(x) = xan +1−an

1−a,

and since 0 < a < 1, this yields that limn→∞Ex[Xn] = 1/(1−a) = N/2, which is theexpectation of the stationary distribution.

1.9.5 Wright-Fisher model

The Wright-Fisher model is an idealized genetics model used to investigate the fluc-tuation of gene frequency in a population of constant size under the influence ofmutation and selection. The model describes a simple haploid random reproductiondisregarding selective forces and mutation pressure. The size of the population is setto N individuals of two types 1 and 2. Let Xn be the number of individuals of type 1at time n. Then Xn, n ∈ N is a Markov chain with state-space X= 0,1, . . . , Nand transition matrix

P( j,k) =(

Nk

)(j

N

)k(1− j

N

)N−k

,

Page 35: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.9 Discrete State Space Examples 23

with the usual convention 00 = 1. In words, given that the number of type 1 indi-viduals at the current generation is j, the number of type 1 individuals at the nextgeneration follows a binomial distribution with success probability j/N. Lookingbackwards, this can be interpreted as having each of the individual in the next gen-eration ’pick their parents at random’ from the current population. The states 0 andN are absorbing. This will be further discussed in 3.10.5.

1.9.6 Discrete Time queueing system

Clients arrive for service and enter in a queue. During each time interval a singlecustomer is served, provided that at least one customer is present in the queue. Weassume that the number of arrivals during the n-th service period is a sequence ofi.i.d. integer-valued random variable Zn, n ∈N, independent of the initial state X0and whose distribution is given by

P(Zn = k) = ak ≥ 0 , k ∈ N ,∞

∑k=0

ak = 1 .

The state of the queue at the start of each period is defined to be the number ofclients waiting for service, which is given by

Xn+1 = (Xn−1)++Zn+1 .

For x≥ 1, the Markov kernel of the chain is given by P(x,y) = ay−x+1 for y≥ x−1and P(x,y) = 0 otherwise. On the other hand, P(0,y) = ay, for y≥ 0.

Set m = E [Z1]. For n≥ 0,

E [Xn+1 | Xn] = (Xn−1)++m =

Xn +m−1 if Xn > 0 ,

m if Xn = 0 .

= Xn +m−1+1Xn=0 .

This yields

Ex[Xn+1] = Ex[Xn]+m−1+Px(Xn = 0)

If m > 1, then Xn, n ∈ N is a submartingale and

Ex[Xn]≥ x+n(m−1)→ ∞ .

Thus an invariant distribution cannot exist if m > 1. If m < 1 it will be proved inSection 3.10.6 that an invariant distribution exists.

Page 36: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

24 1 Markov Chains: Basic definitions and examples

1.9.7 Galton-Watson process

The Galton-Watson process is a branching process arising from Francis Galton’sinvestigation of the extinction of family names. The process models family namesas patrilineal (passed from father to son); offsprings are either male or female, andnames become extinct holders of the family name die without male descendants.This model has been applied in many different applications including the survivalprobabilities for a new mutant gene, the initiation of a nuclear chain reaction, thedynamics of disease outbreaks in their first generations of spread, or the chances ofextinction of small population of organisms.

A Galton-Watson process is a stochastic process Xn, n ∈ N which evolves ac-cording to the recursion X0 = 1 and

Xn+1 =Xn

∑j=1

ξ(n+1)j , (1.27)

where ξ (n+1)j : n, j ∈N is a set of i.i.d. nonegative integer-valued random variables

with distribution ν . The random variable Xn can be thought of as the number of de-scendants (along the male line) in the n-th generation, and ξ (n+1)

j , j = 1, . . . ,Xnrepresents the number of (male) children of the j-th descendant of the n-th genera-tion.

The conditional distribution of Xn+1 given the past depends only on the cur-rent size of the population Xn and the the number of offsprings of each individ-ual ξ (n+1)

j Xnj=1 which are conditionally independent given the past. The process

Xn, n ∈ N is therefore an homogeneous Markov chain whose transition matrix isgiven by P(0,0) = 1 and for j ∈ N∗ and k ∈ N,

P( j,k) = ∑(k1,...,k j)∈N j ,k1+···+k j=k

ν(k1)ν(k2) · · ·ν(k j) = ν∗ j(k) .

The set 0 is absorbing and the population is forever extinct if it reaches zero.

1.9.8 INAR process

An INAR (INteger AutoRegressive) process is a Galton Walton process with immi-gration, defined by the recursion X0 = 1 and

Xn+1 =Xn

∑j=1

ξ(n+1)j +Yn+1 , (1.28)

Page 37: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.10 Time Series Examples 25

where ξ (n)j , j,n ∈ N∗ are i.i.d. integer valued random variables and Yn, n ∈ N∗

is a sequence of i.i.d. integer valued random variales, independent of ξ (n)j . The

random variable Yn+1 represents the “immigrants”, that is the part of the (n+ 1)-th generation which does not descend from the n-th generation. Contrary to theprevious model, the state 0 is not absorbing.

Let ν be the distribution of ξ 11 and µ be the distribution of Y1. Then the transition

matrix of the INAR process is given, for j ∈ N and k ∈ N, by

P( j,k) = µ ∗ν∗ j(k) .

We will study the existence of an invariant distribution in Section 3.10.3.

1.10 Time Series Examples

Many time series models are Markov chains or can be embedded into Markov chainsor use Markov chains as a building block. We will describe in this chapter severalwell known time series models in the framework of Markov chains, starting by linearprocesses, then non linear and finally state space models.

1.10.1 Autoregressive processes

The AR(1) process Xk, k ∈ N is defined recursively as follows: the value of theprocess Xk at time k, is an affine combination of the previous value Xk−1 and aninnovation (noise), i.e.

Xk = µ +φXk−1 +Zk , (1.29)

where Zk, k ∈N is a sequence of i.i.d. real-valued random variables, independentof X0. The AR(1) model can be seen as an extension of the random walk model

Xk = Xk−1 +Zk . (1.30)

These models can be extended to Rd if Φ is a d× d matrix and Zk, k ∈ N is ani.i.d. sequence of d-dimensional random vectors.

We assume that E [|Z1|]< ∞ and E [Z1] = 0. Then Xk, k ∈N is a Markov chainwith kernel

P(x,A) = P(Z1 +φx+µ ∈ A) , A ∈B(R) . (1.31)

Equivalently, for all h ∈ F+(X,X ) and x ∈ X,

Ph(x) = E [h(µ +φx+Z1)] .

Page 38: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

26 1 Markov Chains: Basic definitions and examples

If h ∈ Cb(X), Lebesgue’s dominated convergence theorem yields that x 7→ Ph(x) iscontinuous, thus P is a Feller kernel. By iterating (1.29) we get for all k ≥ 1,

Xk = φkX0 +

k−1

∑j=0

φj(µ +Zk− j) = φ

kX0 +1−φ k

1−φµ +Ak , (1.32)

with

Ak =k−1

∑j=0

φjZk− j .

Since Zk, k ∈ N is an i.i.d. sequence, for each k the two random variables Ak andBk = ∑

k−1j=0 φ jZ j have the same distribution. Thus, for every h ∈ Cb(X),

Ex[h(Xk)] = E[h(φ kx+µ(1−φ

k)/(1−φ)+Ak)]

= E[h(φ kx+µ(1−φ

k)/(1−φ)+Bk)].

Assume that |φ |< 1. Then Bk, k ∈N is a martingale and is bounded in L1(P), i.e.

supk≥0

E [|Bk|]≤ E [|Z0|]∞

∑j=0|φ j|< ∞ .

Hence, by the martingale convergence theorem (Theorem D.18),

BkP-a.s.−→ B∞ =

∑j=0

φjZ j .

Thus, for every h ∈ Cb(X), Lebesgue’s dominated convergence theorem yields

limk→∞

δxPkh = E [h(B∞ +µ/(1−φ))] .

Proposition 1.42 shows that the distribution of B∞ +µ/(1−φ) is the unique invari-ant distribution of the kernel P defined in (1.31). When |φ | > 1, the Markov chainis evanescent, i.e. for any starting point x ∈ R, Px(limn→∞ |Xn|= ∞) = 1.

Proposition 1.44 If E [Z1] = 0 and |φ |> 1 and if the distribution of ∑∞j=1 φ− jZ j is

continuous, then for all x ∈ R, Px(limn→∞ |Xn|= ∞) = 1.

Proof. If X0 = x, applying (1.32), we have, for all n≥ 1,

φ−nXn = x+φ

−n φ n−1φ −1

µ +n−1

∑j=0

φj−nZn− j

= x+1−φ−n

φ −1µ +

n

∑j=1

φ− jZ j .

Page 39: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.10 Time Series Examples 27

Thus, by using the same martingale argument as previously, we have

limn→∞

φ−nXn = x+

1φ −1

µ +∞

∑j=1

φ− jZ j Px-a.s.

Thus limn→∞ |Xn|=+∞ unless possibly if x+ 1φ−1 µ +∑

∞j=1 φ− jZ j = 0, which hap-

pens with zero Px probability for all x if the distribution of ∑∞j=1 φ− jZ j is continuous.

2

The AR(1) process can be generalized by assuming that the current value isobtained as an affine combination of the p preceding values of the process and arandom disturbance. Let Zk, k ∈ N be a sequence of i.i.d. real valued randomvariables. Let φ1, . . . ,φp be real numbers and let X0,X−1, . . . ,X−p+1 be random vari-ables, independent of the sequence Zk, k ∈ N. The AR(p) process Xk, k ∈ N isdefined by the recursion

Xk = µ +φ1Xk−1 +φ2Xk−2 + · · ·+φpXk−p +Zk , k ≥ p . (1.33)

The sequence Xk, k ∈ N is a Markov chain of order p; see Definition 1.43. Thevector process Xk = (Xk,Xk−1, . . . ,Xk−p+1) is a vector autoregressive process of or-der 1, defined by the recursion:

Xk = Bµ +ΦXk−1 +BZk (1.34)

with

Φ =

φ1 · · · · · · φp1 0 0...

. . ....

0 1 0

, B =

10...0

.

Thus Xk, k ∈ N is an Rp valued Markov chain with kernel

P(x,A) = P(Bµ +BZ0 +Φx ∈ A) (1.35)

for x ∈ Rp and A ∈B(Rp). As above, P is a Feller kernel. By iterating (1.34) weget for k ≥ 1,

Xk = ΦkX0 +

k−1

∑j=0

ΦjB(µ +Zk− j) . (1.36)

Because Zk, k ∈ N is i.i.d. for any k the two random vectors Ak = ∑k−1j=0 Φ jBZk− j

and Bk = ∑k−1j=0 Φ jBZ j have the same distribution. Let 9Φ9 be the spectral radius

of the matrix Φ , that is

9Φ9 = supu∈Rp,|u|≤1

|Φu|

Page 40: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

28 1 Markov Chains: Basic definitions and examples

If the spectral radius of Φ is strictly less than 1, there exist a constant C and ρ ∈ [0,1)such that, for all j ≥ 0, 9Φ j9≤Cρ j This implies that the sequence Bk, k ∈N isa vector-valued martingale which is bounded in L1(P), i.e.

supk≥0

E [‖Bk‖]≤ E [|Z0|]∞

∑j=0

9Φj 9‖B‖< ∞ .

Hence, by Theorem D.18,

BkP-a.s.−→ B∞ =

k−1

∑j=0

ΦjBZk− j .

For every h ∈ Cb(X) we therefore have, by dominated convergence,

limk→∞

Ex[h(Xk)] = limk→∞

E[h(Φkx+µ(I−Φ)−1(I−Φ

k)B+Bk)]

= E[h(µ(I−Φ)−1B+B∞)

].

Proposition 1.42 yields that the distribution of µ(I −Φ)−1B + B∞ is the uniqueinvariant distribution of the kernel P defined in (1.35). When the spectral radius ofΦ is strictly larger than 1 then as previously, the Markov chain is evanescent.

1.10.2 Functional autoregressive processes

In the AR(1) model, the conditional expectation of the value of the process at time kis an affine function of the previous value: E

[Xk |F X

k−1

]= µ +φXk−1. In addition,

provided that E[Z2

1]< ∞ in (1.29), the conditional variance is almost-surely con-

stant since E[(Xk−E

[Xk |F X

k−1

])2∣∣F X

k−1

]=E

[Z2

1]P-a.s.. We say that the model

is conditionally homoscedastic. Of course, these assumptions can relaxed in severaldirections. We might first consider model which are still conditionally homoscedas-tic, but for which the conditional expectation of Xk given the past is a non-linearfunction of the past observation Xk−1, leading to the conditionally homoscedasticfunctional autoregressive (FAR(1)) given by

Xk = f (Xk−1)+Zk , (1.37)

where Zk, k ∈ N∗ be a sequence of integrable zero-mean i.i.d. real-valued ran-dom variable independent of X0 and f : R → R is a measurable function. Withthis definition, the conditional expectation of Xk given F X

k−1 is given by f (Xk−1) =

E[

Xk |F Xk−1

]. The kernel of this chain is given by

P(x,A) = P(Z1− f (x) ∈ A) , A ∈B(R) .

For any h ∈ F+(X,X ), we get

Page 41: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.10 Time Series Examples 29

Ph(x) = E [h( f (x)+Z1)] .

If the function f is continuous, then P is Feller. Of course, compared to the AR(1)model, this model does not lend itself well to a direct analysis, because the expres-sion of the successive iterates of the chain is rather involved. We will have to waituntil Chapter 8 and Chapter 7 to have the necessary tools to find conditions underwhich this kernel is stable / evanescent.

Note that (1.37) might be seen as a general discrete time dynamical model xk =f (xk−1) perturbed by some noise Zk, k ∈ N. It is of course expected that thestability of the property of the discrete time dynamical system are related to thestability of (1.37). We will later see that this is indeed the case.

It is also of interest to consider cases in which the conditional variance

Var(

Xk |F Xk−1)= E

[(Xk−E

[Xk |F X

k−1])2 ∣∣F X

k−1],

is a function of the past observation Xk−1; such models are said to be conditionallyheteroscedastic. Heteroscedasticity can be modeled by considering the recursion

Xk = f (Xk−1)+σ(Xk−1)Zk , (1.38)

where σ is some non-negative function most often defined on R+. Assuming thatE[Z2

1]= 1, the conditional variance is given by Var

(Xk |F X

k−1

)= σ2(Xk−1). The

kernel of this Markov chain is given, for x ∈ R and A ∈B(R) by

P(x,A) = P(σ(x)Z1 ∈ A− f (x)) .

For h ∈ F+(X,X ), we get

Ph(x) = E [h( f (x)+σ(x)Z1)] .

If the functions f and σ are both continuous, then the kernel is Feller. Here again,except in very specific case, these models do not lend itself to an elementary anal-ysis and we will have to wait to later chapters to discuss their properties (see inparticular Chapter 8, Chapter 7 and Chapter 7). As above, these models can be gen-eralized by assuming the conditional expectation E

[Xk |F X

k−1

]and the conditional

variance Var(

Xk |F Xk−1

)are nonlinear functions of the p previous values of the pro-

cess, (Xk−1,Xk−2, . . . ,Xk−p):

Xk = f (Xk−1, . . . ,Xk−p)+σ(Xk, . . . ,Xk−p)Zk , (1.39)

where f : Rp→ R and σ : Rp→ R+ are measurable functions.

Example 1.45 (ARCH(p)). It has been generally acknowledged in the economet-rics and applied financial literature that many financial time series such as log-returns of share prices, stock indices, and exchange rates, exhibit stochastic volatil-ity and heavy-tailedness. These features cannot be adequately modelled via a lineartime series model. Nonlinear models, such as the ARCH model and the bilinear

Page 42: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

30 1 Markov Chains: Basic definitions and examples

models (see Theorem 1.46) have been proposed to capture these and other char-acteristics. In order for a linear time series model to possess heavy-tailed marginaldistributions, it is necessary for the input noise sequence to be heavy-tailed. For non-linear models, heavy-tailed marginals can be obtained when the system is injectedwith light-tailed marginals such as with normal noise. An Autoregressive Condi-tional Heteroscedastic model of order p, ARCH(p), is defined as a solution of therecursion

Xk = σk Zk , (1.40a)

σ2k = α0 +α1X2

k−1 + · · ·+αpX2k−p , (1.40b)

where the coefficients α j ≥ 0, j ∈ 0, . . . , p are non-negative and Zk, k ∈ Z isa sequence of i.i.d. random variable with zero mean (often assumed to be standardGaussian). The ARCH(p) process is a Markov chain of order p. Assume that Z1has a bounded continuous density g w.r.t. Lebesgue’s measure on R. Then, for h ∈F+(Rp,B (Rp)) we get

Ph(x1, . . . ,xp)

= E[

h(√

α0 +α1x21 + · · ·+αpx2

pZ1)

]

=∫

h(y)1√

α0 +α1x21 + · · ·+αpx2

p

g

y√α0 +α1x2

1 + · · ·+αpx2p

dy .

The kernel therefore has a density w.r.t. Lebesgue’s measure given by

p(x1, . . . ,xp;y) =1√

α0 +α1x21 + · · ·+αpx2

p

g

y√α0 +α1x2

1 + · · ·+αpx2p

.

By Proposition 10.18, this implies that the kernel P is Feller. We will latter see thatit is relatively easy to discuss the properties of this model, which is used widely infinancial econometric.

Example 1.46 (Simple Markov bilinear model). The simple Markov bilinear pro-cess is defined by the recursion

Xk = aXk−1 +(1+bXk−1)Zk , (1.41)

where a and b are scalar and Zk, k ∈ N is an i.i.d. sequence of random vari-ables which are independent of X0. Assuming that E

[Z2

1]= 1 and E [Z1] = 0, the

conditional expectation of Xk is linear E[

Xk |F Xk−1

]= θXk−1 like an AR(1) model

but the model is conditionally heteroscedastic with conditional variance given byVar(

Xk |F Xk−1

)= (1+bXk−1)

2. It might be seen as a AR(1) process with ARCH(1)error. This model is useful for modeling financial time series in which the current

Page 43: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.10 Time Series Examples 31

volatility depends on the past value, including on its sign. This asymmetry has beenpointed out to be a characteristic feature of financial time series.

Example 1.47 (Self-Exciting threshold AR model). Self-exciting threshold AR(SETAR) models have been widely employed as a model for nonlinear time series.Threshold models are piecewise linear AR models for which the linear relationshipvaries according to delayed values of the process (hence the term self-exciting). Inthis class of models, it is hypothesized that different autoregressive processes mayoperate and that the change between the various AR is governed by threshold valuesand a time lag. A `-regimes TAR model has the form

Xk =

φ(1)0 +∑

p1i=1 φ

(1)i Xk−i +σ (1)Z(1)

k if Xk−d ≤ r1 ,

φ(2)0 +∑

p2i=1 φ

(2)i Xk−i +σ (2)Z(2)

k if r1 < Xk−d ≤ r2 ,...

...

φ(`)0 +∑

p`i=1 φ

(`)i Xk−i +σ (`)Z(`)

k if r`−1 < Xk−d ,

(1.42)

where Zk, k ∈N is an i.i.d. sequence of real-valued random variables, the positiveinteger d is a specified delay, and −∞ < r1 < · · ·< rk−1 < ∞ is a partition of X=R.These models allow for changes in the AR coefficients over time, and those changesare determined by comparing previous values (back-shifted by a time lag equal to d)to fixed threshold values. Each different AR model is referred to as a regime. In thedefinition above, the values p j of the order of AR models can differ in each regime,although in many applications, they are equal.

The model can be generalized to include the possibility that the regimes dependon a collection of the past values of the process, or that the regimes depend on anexogenous variable (in which case the model is not self-exciting).

The popularity of TAR models is due to their being relatively simple to specify,estimate, and interpret as compared to many other nonlinear time series models.In addition, despite its apparent simplicity, the class of TAR models can reproducemany nonlinear phenomena such as stable and unstable limit cycles, jump reso-nance, harmonic distortion, modulation effects, chaos and so on.

Example 1.48 (Smooth Transition Autoregressive model (STAR)). Another pos-sible extension of the autoregressive models is to replace the hard thresholds bysmooth functions, therefore avoiding discontinuity in the autoregressive coefficients.As an example, consider the following smooth autoregressive model (which mightbe seen as an extension of a 2-regimes STAR model)

Xk =p

∑i=1φi +πiu(Xk−d)Xk−i +Zk , (1.43)

where u is either given by

u(x) = (1+ exp−γ(x− c))−1 . (1.44)

leading to the logistic smooth transition autoregressive model or

Page 44: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

32 1 Markov Chains: Basic definitions and examples

u(x) = 1− exp−γ(x− c)2 (1.45)

leading to the exponential smooth transition autoregressive model. In both cases,γ is a positive constant and c ∈ R In this case, the autoregressive part retains anadditive form, but the coefficients entering the regression vary smoothly from oneregime to the other with Xk−d .

1.10.3 Random coefficient autoregressive models

A process closely related to the AR(1) process is the random coefficient autoregres-sive (RCA) process

Xk = AkXk−1 +Bk , (1.46)

where (Ak,Bk), k ∈ N∗ is a sequence of independent random vectors in R2 in-dependent of X0. For any h ∈ F+(X,X ) and x ∈ X, E

[h(Xk) |F X

k−1

]= Ph(Xk−1)

wherePh(x) = E [h(A1x+B1)] . (1.47)

If h∈Cb(X) then the Lebesgue convergence theorem shows that Ph∈Cb(X); henceP is a Feller kernel. Despite its apparent simplicity, random coefficient autoregres-sive models are rather general and many important nonlinear models fall into theframework of (1.46). For example, consider the ARCH(1) process discussed whereXk = σkZk, and σ2

k = α0 + α1X2k−1. It then follows that σ2

k = α1Z2k−1σ2

k−1 + α0,which fits into the framework of (1.46) where Bk = α0 is deterministic. The simplebilinear model (1.41) is also an RCA process with Ak = a+bZk and Bk = Zk.

Example 1.49. Consider a Markov chain whose state space X = (0,1) is the openunit interval. If the chain is at x, then pick one of the two intervals (0, x) or (x, 1)with equal probability 1/2, and move to a point y according to the uniform dis-tribution on the chosen interval. This Markov chain has a transition density w.r.t.Lebesgue measure on the interval (0,1), given by

k(x,y) =12x1(0,x)(y)+

12(1− x)

1(x,1)(y) . (1.48)

The first term in the sum corresponds to a move from x to the interval (0,x); thesecond, to a move from x to the interval (x,1).

This Markov chain can be equivalently represented as an iterated random se-quence. Let Uk, k ∈ N be a sequence of i.i.d. random variable uniformly dis-tributed on the interval (0,1). Let εk, k ∈ N be a sequence of i.i.d. Bernoullivariables distributed with probability of success 1/2, independent of Uk, k ∈ N.Let X0, the initial state, be distributed according to some initial distribution ξ on(0,1), and be independent of Uk, k ∈ N and εk, k ∈ N. Define the sequenceXk, k ∈ N∗ as follows

Page 45: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.10 Time Series Examples 33

Xk = εkXk−1Uk +(1− εk)Xk−1 +Uk(1−Xk−1) , (1.49)

Of course, RCA model can be extended to vector valued processes. Let (Ak,Bk), k∈N∗ be an i.i.d. sequence such that for each k, Ak is a p× p random matrix withreal valued entries and Bk is a random vector in Rp. Let Xk, k ∈ N be theMarkov chain with state space Rp defined by X0, a random vector independent of(Ak,Bk), k ∈ N∗ and

Xk = AkXk−1 +Bk . (1.50)

1.10.4 Observation driven models

Definition 1.50 (Observation driven model) Let (X,X ) and (Y,Y ) be measur-able spaces, Q be a Markov kernel on X×Y and f : X×Y→ X be a measurablefunction. Let (Ω , F ,Fk, k ∈ N,P) be a filtered probability space. An observa-tion driven stochastic process (Xk,Yk), k ∈N is an adapted process taking valuesin X×Y such that, for all k ∈ N∗ and all A ∈ Y ,

P(Yk ∈ A |Fk−1) = Q(Xk−1,A) , (1.51)Xk = f (Xk−1,Yk) . (1.52)

The process (Xk,Yk), k ∈ Z is a Markov chain with kernel P characterized by

P((x,y),A×B) =∫

B1A( f (x,z))Q(x,dz) , (1.53)

for all (x,y) ∈ X×Y, A ∈X and B ∈ Y . Note that Yk is independent of Yk−1 con-ditionally on Xk−1, but Yk may not be a Markov chain. The sequence Xk is aMarkov chain with kernel P1 defined for x ∈ X and A ∈X by

P1(x,A) =∫Y1A( f (x,z))Q(x,dz) , (1.54)

We can express Xk as a function of the sequence Y1, . . . ,Yk and of X0. Writingfy(x) for f (x,y) and fy2 fy1(x) for f ( f (x,y1),y2), we have,

Xk = fYk · · · fY1(X0) . (1.55)

The name observation driven comes from the fact that in statistical applications,only the sequence Yk is observable. More generally, we know by Theorem 1.23that any Markov chain can be represented in this way, with the kernel Q being agiven probability measure, that is with Yk being an i.i.d. sequence, independent ofX0.

Page 46: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

34 1 Markov Chains: Basic definitions and examples

Example 1.51 (ARMA(p,q)). A generalization of the AR(p) model is obtained byadding a moving average part to the autoregression:

Xk = µ +α1Xk−1 + · · ·+αpXk−p +Zk +β1Zk−1 + · · ·+βqZk−q . (1.56)

where Zk, k ∈ Z is a sequence of i.i.d. random variables with E [Z0] = 0.The ARMA(p,q) process is not a Markov chain, but the multivariate time series

(Xk+1, . . . ,Xk+r) is a Markov chain of order r = p∨q. Indeed, setting α j = 0 if j > pand β j = 0 if j > q yields

Xk+1...

Xk+r

=

0 1 . . .... 0 1 . . ....

. . .. . .0 . . . 0 1αr . . . α1

Xk

...Xk+r−1

+

0...0

µ +Zk +β1Zk−1 + · · ·+βrZr

.

(1.57)

Example 1.52 (GARCH). The limitation of the ARCH model is that the squaredprocess has the autocorrelation structure of an autoregressive process, which doesnot always fit the data. A generalization of the ARCH model is obtained by allowingthe conditional variance to depend on the lagged squared returns (X2

t−1, . . . ,X2t−p)

and on the lagged conditional variances. This model is called the Generalized Au-toregressive Conditional Heteroscedastic (GARCH) model, defined by the recursion

Xk = σkZk (1.58a)

σ2k = α0 +α1X2

k−1 + · · ·+αpX2k−p +β1σ

2k−1 + · · ·+βqσ

2k−q . (1.58b)

where the coefficients α0, . . . ,αp,β1, . . . ,βq are nonnegative and Zk, k ∈ Z is asequence of i.i.d. random variables with E [Z0] = 1.

The GARCH(p,q) process is not a Markov chain, but the multivariate time series(σ2

k+1, . . . ,σ2k+r) is a Markov chain of order r = p∨q. Indeed, setting α j = 0 if j > p

and β j = 0 if j > q yields

σ2k+1...

σ2k+r

=

0 1 . . .... 0 1 . . ....

. . . . . .0 . . . 0 1

αrZ2k +βr . . . α1Z2

k+r−1 +β1

σ2

k...

σ2k+r−1

+

0...

α0

. (1.59)

Page 47: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.10 Time Series Examples 35

These models do not allow for dependence between the volatility and the sign ofthe returns, since the volatility depends only on the squared returns. This propertywhich is often observed in financial time series is the so-called leverage effect. Toaccommodate this effect, several modifications of the GARCH model have beenconsidered. We give two such examples.

Example 1.53 (EGARCH). The EGARCH(p,q) models the log-volatility as anARMA process which is not independent of the innovation of the returns. Moreprecisely, it is defined by the recursion

Xk = σkZk , (1.60a)

logσ2k = α0 +

p

∑j=1

α jηk− j +q

∑j=1

β j logσ2k− j , (1.60b)

where (Zn,ηn,),n ∈ N is a sequence of i.i.d. bivariate random vectors with pos-sibly dependent components. The original specification of the sequence ηn isηk = θZk +λ (|Zk|−E [|Z0|])

Example 1.54 (TGARCH). The TGARCH models the volatility as a thresholdARMA process where the coefficient of the autoregressive part depends on the signof the innovation. More precisely, it is defined by the recursion

Xk = σkZk , (1.61a)

σ2k = α0 +αX2

k−1 +φX2k−11Zk−1>0+βσ

2k−1 . (1.61b)

Example 1.55 (Log-linear Poisson autoregression). Let the sequence (Xk,Yk), k∈Z, adapted to the filtration Fk, be defined as follows.

L (Yk|Fk−1) = Poisson(exp(Xk−1)) (1.62a)Xk = a+bXk−1 + c log(1+Yk) , k ≥ 1 , (1.62b)

where a, b, c are real-valued parameters. This model belongs to the class of observa-tion driven models. Define the function f on R×N and the kernel Q on R×P(N)by

f (x,y) = a+bx+ c log(1+ y) ,

Q(x,A) = e−ex∑j∈A

e jx

j!,

for x ∈ R, y ∈ N and A ⊂ N. In this log-linear model, the observation Yk is fed intothe autoregressive equation for Xk via the term log(1+Yk−1). Adding one to theinteger valued observation is a standard way to avoid potential problems with zerocounts. The log-intensity Xk can be expressed as in (1.55) in terms of the laggedresponses by expanding (1.62b):

Page 48: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

36 1 Markov Chains: Basic definitions and examples

Xk = a1−bk

1−b+bkX0 + c

k−1

∑i=0

bi log(1+Yk−i−1) .

This model can be also represented as a functional autoregressive model with ani.i.d. innovation. Let Nk, k ∈ k ∈ N∗ be a sequence of independent unit rate ho-mogeneous Poisson process on the real line, independent of X0. Then Xn, n ∈ Nmay be expressed as Xk = F(Xk−1,Nk), where F is the function defined on R×NR

by

F(x,N) = a+bx+ c log1+N(ex) . (1.63)

The transition kernel P of the Markov chain Xk, k ∈ N can be expressed as

Ph(x) = E [h(a+b+ c log1+N(ex))] ,

for all bounded measurable functions h, where N is a homogeneous Poisson process.

1.11 Markov Chain Monte Carlo Examples

Let ν be a measure on a state space (X,X ) and let h ∈ F+(X,X ) such that0 <

∫X h(x)ν(dx) < ∞. Typically X is an open subset of Rd , and ν is the Lebesgue

measure or X is countable and ν is the counting measure. For simplicity, it is as-sumed in the sequel that h(x) > 0 for all x ∈ X (this assumption may be easilyrelaxed). This function gives rise to a probability measure π on X defined by

π(A) =∫

A h(x)ν(dx)∫X h(x)ν(dx)

(1.64)

We want to estimate expectations of functions f ∈ F(X,X ) with respect to π

π( f ) =∫X f (x)h(x)ν(dx)∫

X h(x)ν(dx).

If the state space X is high-dimensional, and h is complicated, direct numericalintegration is not an option. The Monte Carlo solution to this problem is to simulatei.i.d. random variables Z0, Z1, . . . , Zn−1 with distribution π and then to estimateπ( f ) by the sample mean

π( f ) = n−1n−1

∑i=0

f (Zi) (1.65)

This gives an unbiased estimate with standard deviation of order O(1/√

n). Further-more, if π( f 2) < ∞, then by the classical Central Limit Theorem, the normalizederror

√n(π( f )−π( f )) has a limiting normal distribution, which is also useful. The

Page 49: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.11 Markov Chain Monte Carlo Examples 37

problem often encountered in applications is that it might be very difficult to simu-late i.i.d. random variables with distribution π .

Instead, the Markov chain Monte Carlo (MCMC) solution is to construct aMarkov chain on X which has π as invariant distribution. Then, hopefully, if werun the Markov chain for a long time (regardless of the initial distribution), the dis-tribution of Xn for large n will be close to the invariant distribution. Under conditionsthat will be made explicit in later chapters, limn→∞ n−1

∑n−1k=0 f (Xk) = π( f ) Pπ -a.s.;

see Chapter 5.At first sight, it may seem even more difficult to find such a Markov chain than to

estimate π( f ) directly. However, we shall see that constructing such Markov chainsis often straightforward.

1.11.1 Metropolis-Hastings Algorithm

Let Q be a Markov kernel having a density q w.r.t. ν i.e. Q(x,A) =∫

A q(x,y)ν(dy)for every x ∈ X and A ∈X . For simplicity, it is assumed that for all (x,y) ∈ X×X,q(x,y)> 0; here again, this assumption can be easily relaxed.

The Metropolis-Hastings algorithm proceeds in the following way. An initialstarting value X0 is chosen. Given Xk, a candidate Yk+1 is sampled from Q(Xk, ·).With probability α(Xk,Yk+1), this proposal is accepted and the chain moves toXk+1 = Yk+1. Otherwise the step is rejected and the chain remains at Xk+1 = Xk.The probability α(Xk,Yk+1) of accepting the move is given by

α(x,y) = min(

h(y)h(x)

q(y,x)q(x,y)

,1)

. (1.66)

The last step is often called the Metropolis rejection. The name is reminiscent ofrejection sampling but this is a misleading analogy because rejection sampling isdone repeatedly until some proposal is accepted (so it always produces a new valueof the state). In contrast, one Metropolis-Hastings update makes one proposal Yk+1,which is the new state with probability α(Xk,Yk+1); otherwise the new state Xk+1 isthe same as the old state Xk.

The acceptance ratio α(x,y) only depends on the ratio h(y)/h(x); therefore, weonly need to know h up to a normalizing constant. In Bayesian inference, this prop-erty plays a crucial role.

This procedure produces a Markov chain, Xk, k ∈ N, with Markov kernel Pgiven by

P(x,A) =∫

Aα(x,y)q(x,y)ν(dy)+ α(x)δx(A) (1.67)

whereα(x) :=

∫X1−α(x,y)q(x,y)ν(dy) . (1.68)

Page 50: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

38 1 Markov Chains: Basic definitions and examples

Proposition 1.56 The distribution π is reversible w.r.t. the Metropolis-Hastings ker-nel P.

Proof. For any (x,y) ∈ X×X, it holds that

h(x)α(x,y)q(x,y) = h(x)q(x,y)∧h(y)q(y,x)= h(y)α(y,x)q(y,x) . (1.69)

Hence for any C ∈X ×X ,∫∫h(x)α(x,y)q(x,y)1C(x,y)ν(dx)ν(dy)

=∫∫

h(y)α(y,x)q(y,x)1C(x,y)ν(dx)ν(dy) . (1.70)

On the other hand,∫∫h(x)δx(dy)α(x)1C(x,y)ν(dx) =

∫h(x)α(x)1C(x,x)ν(dx)

=∫

h(y)α(y)1C(y,y)ν(dy) =∫∫

h(y)δy(dx)α(y)1C(x,y)ν(dy) . (1.71)

Hence, summing (1.70) and (1.71) we get∫∫h(x)P(x,dy)ν(dx)1C(x,y) =

∫∫h(y)P(y,dx)1C(x,y)ν(dy) ,

showing that π is reversible w.r.t. P. 2

As a corollary of Proposition 1.36, we obtain that π is an invariant distribution forthe Markov kernel P.

Example 1.57 (Metropolis Algorithm). The Metropolis algorithm is a particularcase of the Metropolis-Hasting algorithm, where the proposal transition density issymmetric, i.e. q(x,y) = q(y,x), for every (x,y) ∈ X×X. As an example, let q be asymmetric density w.r.t. 0, i.e. q(−y) = q(y) for all y ∈ X. Consider the transitiondensity defined by: q(x,y) = q(y− x). This means that if the current state is Xk, anincrement Zk+1 is drawn from q, and the candidate Yk+1 = Xk +Zk+1 is proposed.Widely used proposals of this type have Zk+1 normally distributed with mean zeroor are uniformly distributed on a ball or a hypercube centered at zero.

The acceptance probability (1.66) for the Metropolis algorithm can be expressedas

α(x, y) = 1∧ h(y)h(x)

. (1.72)

If h(Yk+1) ≥ h(Xk), then the move is always accepted and if h(Yk+1) < h(Xk), thenthe move is accepted with a probability strictly less than one.

The choice of the incremental distribution is obviously crucial for the efficiencyof the algorithm. A classical choice for q is the multivariate normal distribution withzero-mean and covariance matrix Γ to be suitably chosen.

Page 51: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.11 Markov Chain Monte Carlo Examples 39

Example 1.58 (Independent sampler). Another possibility is to set the transitiondensity to be q(x,y) = q(y), where q is again a density on X. In this case, the nextcandidate is drawn independently of the current state of the chain. This yields theso-called independent sampler, which is closely related to accept-reject algorithmfor random variable simulation. In this case, the acceptance probability (1.66) isgiven by

α(x, y) = 1∧ q(x)h(y)q(y)h(x)

. (1.73)

Assume for example that h is the standard Gaussian density and that q is thedensity of the Gaussian distribution with zero mean and variance σ2, so thatq(x) = h(y/σ)/σ . Assume that σ2 > 1 so that the values being proposed are sam-pled from a distribution with heavier tails than the objective distribution h. Then theacceptance probability is

α(x,y) =

1 |y| ≤ |x|exp(−(y2− x2)(1−σ−2)/2) |y|> |x|

Thus the algorithm accepts all moves which decrease the current state |Xk| and onlysome which increase |Xk|.

If σ2 < 1, the values being proposed are sampled from a lighter tailed distributionthan h and the acceptance probability becomes

α(x,y) =

exp(−(x2− y2)(1−σ−2)/2) |y| ≤ |x|1 |y|> |x|

It is natural to inquire whether heavy-tailed or light-tailed proposal distributionsshould be preferred. This question will be partially answered in Theorem 6.32.

1.11.2 Data-augmentation

Throughout this section, (X,B(X)) and (Y,B(Y)) denote Borel spaces. Simi-larly to Metropolis-Hastings algorithms, we wish to target a probability distribu-tion π defined on (X,B(X)) using a sequence Xk, k ∈ N of X-valued randomvariables. Data-augmentation algorithms, introduced in [Tanner and Wong, 1987],are based on writing the target distribution π as the marginal of some distribu-tion π∗, defined on the extended space (X×Y,B(X)⊗B(Y)), in the sense thatπ∗(dxdy) = π(dx)R(x,dy), where R is a kernel on X×B(Y). Since (X,B(X)) is aBorel space, [Kallenberg, 2002, Theorem 6.3] shows that there exists also a kernel Son Y×B(X) and a probability measure π on Y such that π∗(dxdy) = π(dy)S(y,dx).Finally, R(x, ·) is the distribution of Y conditionally on X = x under π∗, whereasS(y, ·) is the distribution of X conditionally on Y = y under π∗.

Page 52: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

40 1 Markov Chains: Basic definitions and examples

In this situation, a natural idea consists in running a Markov Chain (Xk,Yk), k ∈N with π∗ as invariant distribution and to use the first component process Xk, k ∈N for proposing n−1

∑n−1k=0 f (Xk) as an approximation of π( f ). A significant differ-

ence between this general approach and a Metropolis-Hastings algorithm associatedto the target π is that Xk, k ∈N is no more constrained to be a Markov chain. Still,in some situations, the transition from (Xk,Yk) to (Xk+1,Yk+1) proceeds in two stepswhere the two components are updated successively. More specifically, first, Yk+1 isdrawn from (Xk,Yk). Second, Xk+1 is drawn from (Xk,Yk+1). Intuitively, Yk+1 servesas an auxiliary variable, which directs the moves of Xk to interesting regions withrespect to the target distribution.

When sampling from R and S are feasible, a classical choice consists in followingthe two successive steps: given (Xk,Yk),

(i) sample Yk+1 from R(Xk, ·),(ii) sample Xk+1 from S(Yk+1, ·).

Xk

Yk

Xk+1

Yk+1

SR

S

Fig. 1.1 In this example, sampling from R and S are feasible.

It turns out that the first-component process Xk, k ∈ N is a Markov chain ofMarkov kernel RS that is reversible with respect to π .

Lemma 1.59 The distribution π is reversible with respect to the kernel RS.

Proof. Let C ∈X ×X . The proof follows from∫· · ·∫

π(dx)R(x,dy)S(y,dx′)1C(x,x′) =∫· · ·∫

π∗(dxdy)S(y,dx′)1C(x,x′)

=∫· · ·∫

π(dy)S(y,dx)S(y,dx′)1C(x,x′)=∫· · ·∫

π(dx′)R(x′,dy)S(y,dx)1C(x,x′) .

2

Assume now that sampling from R or S are infeasible. In this case, consider two”instrumental” kernels Q on (X×Y)×Y and T on (X×Y)×X , which will beused to propose successive candidates for Yk+1 and Xk+1. For simplicity, assume thatR(x,dy′) and Q((x,y),dy′) (resp. S(y′,dx′) and T ((x,y′),dx′)) are dominated by thesame measure and call r and q (resp. s and t) the associated transition densities. Asshown by the following Proposition, we may replace the two previous steps by twoMetropolis-Hastings transitions, that target successively R(Xk, ·) and S(Yk+1, ·).

Page 53: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.11 Markov Chain Monte Carlo Examples 41

Proposition 1.60 Define the Markov chain (Xk,Yk), k ∈N by the following tran-sitions. Given (Xk,Yk),

(i) first, propose a candidate Y according some transition kernel Q((Xk,Yk), ·) andaccept Yk+1 = Y with probability:

α(Xk,Yk,Y ) :=r(Xk,Y )q((Xk,Y ),Yk)

r(Xk,Yk)q((Xk,Yk),Y )∧1 ;

otherwise, set Yk+1 = Yk;(ii) second, propose a candidate X according some transition kernel T ((Xk,Yk+1), ·)

and accept Xk+1 = X with probability:

β (Xk,Yk+1,X) :=s(Yk+1,X)t((X ,Yk+1),Xk)

s(Yk+1,Xk)t((Xk,Yk+1),X)∧1 ;

otherwise, set Xk+1 = Xk.

Then, the extended target distribution π∗(dxdy) = π(dx)R(x,dy) = π(dy)S(y,dx)is reversible with respect with the transitions on each component: (Xk,Yk) →(Xk,Yk+1), described in (i), and (Xk,Yk+1)→ (Xk+1,Yk+1), described in (ii).(

XkYk

) (Xk

Yk+1

) (Xk+1Yk+1

)(i) (ii)

It is readily seen that the first component process Xn, n ∈ N is not in generala Markov chain since the distribution of Xk+1 conditionally on (Xk,Yk), except insome special cases, depends on the whole couple (Xk,Yk) and not on Xk only. Asthe product of two π∗-reversible kernels is not π∗-reversible in general, π∗ is fi-nally an invariant probability but not reversible with respect to the Markov kernelof the chain (Xk,Yk), k ∈N. Moreover, the general situation described in Proposi-tion 1.60 also includes the previous particular situation where sampling from R and Swere feasible. Indeed, if we choose as instrumental kernels Q((x,y), ·) = R(x, ·) andT ((x,y), ·) = S(x, ·), then the acceptance probabilities α and β in Proposition 1.60simplify to one. The candidates are then always accepted and we are back to theprevious algorithm.

Proof (of Proposition 1.60). The proof directly follows from Lemma 1.61 below. 2

Lemma 1.61 Let R be a kernel on X×B (Y) and let K be kernel on (X×Y)×B (Y ). Assume that for each fixed x∈X, the probability measure R(x, ·) is reversiblewith respect to the kernel defined on Y×B (Y) by

(y,B) 7→ K((x,y),B) , y ∈ Y , B ∈B(Y) .

Then the probability measure π∗ on X×Y is reversible with respect to the kernelK∗ on (X×Y)×B (X)⊗B (Y), where

Page 54: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

42 1 Markov Chains: Basic definitions and examples

π∗(dx,dy) = π(dx)R(x,dy) ,

K∗((x,y),C) =∫∫

δx(dx′)K((x,y),dy′)1C(x′,y′) ,

for (x,y) ∈ X×Y and C ∈B(X)⊗B(Y).

Proof. Denote for all A, B in B(X)⊗B(Y),

ϕ(A,B) :=∫· · ·∫

π∗(dxdy)δx(dx′)K((x,y),dy′)1A(x,y)1B(x′,y′)

Then,

ϕ(A,B) =∫· · ·∫

π(dx)R(x,dy)δx(dx′)K((x,y),dy′)1A(x,y)1B(x′,y′)

=∫· · ·∫

π(dx′)R(x′,dy)δx′(dx)K((x′,y),dy′)1A(x,y)1B(x′,y′)

=∫· · ·∫

π∗(dx′ dy′)δx′(dx)K((x′,y′),dy)1A(x,y)1B(x′,y′) = h(B,A)

2

1.12 Two-stage Gibbs sampler

Here and throughout this section, (X,B(X)) and (Y,B(Y)) denote Borel spaces.For Data-Augmentations algorithms as for the two-stage Gibbs sampler, we con-sider a distribution π∗ on a product space (X×Y,B(X)⊗B(Y)). But on the con-trary to the previous case, the target distribution is π∗ itself and not one of itsmarginals. Thus, in this situation, the whole Markov chain (Xn,Yn), n ∈N is usedto approximate the target distribution and not the first-component process only.

To construct the Markov chain (Xn,Yn), n∈N with π∗ as an invariant distribu-tion, we proceed exactly as in Data-Augmentation algorithms. Assume that π∗ maybe written as

π∗(dxdy) = π(dx)R(x,dy) = π(dy)S(y,dx)

where π (resp. π) are probability measures on X (resp. Y) and R (resp. S) are ker-nels on X×B(Y) (resp. on Y×B(X)). The Two-Stage Gibbs sampler, which cor-responds to the case where drawing from R and S are feasible, proceeds as follows:given (Xk,Yk),

(i) sample Yk+1 from R(Xk, ·),(ii) sample Xk+1 from S(Yk+1, ·).

Of course, if sampling from R or S are infeasible, one can replace the previousGibbs-transitions by some Metropolis-Hastings on each components as describedin Proposition 1.60. This is the Two-Stage Metropolis-within-Gibbs algorithm. The

Page 55: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

1.12 Two-stage Gibbs sampler 43

kernels R and S can be interpreted in terms of the law of one component condition-ally on the other one under π∗, and thus, these kernels are often written with a slightabuse of notation: R(x,dy) = π∗(dy|x) and S(y,dx) = π∗(dx|y).

Example 1.62 (The slice sampler). The slice-sampler is an improved MCMC algo-rithm based on an auxiliary variable technique. Its theoretical properties have beenanalysed by various authors, including among others [Mira and Tierney, 1997] or[Roberts and Rosenthal, 1999]. Set X = Rd and X = B(X). Let µ be a σ -finitemeasure on (X,X ). Denote by π the density with respect to µ of the target distri-bution. We assume that for all x ∈ X,

π(x) =Ck

∏i=0

fi(x) ,

where C is a constant (which is not necessarily known) and fi are nonnegative func-tions fi : Rd → R+. The f0 slice-sampler proceeds as follows: given Xn, draw in-dependently k random variables Yn+1,1, . . . ,Yn+1,k such that Yn+1,i ∼ U(0, fi(Xn)).Then, sample Xn+1 from the truncated probability having density proportional tof0(·)1L(Yn+1)(·) where Yn+1 = (Yn+1,1, . . . ,Yn+1,k) and for y = (y1, . . . ,yk) ∈ [0,1]k,L(y)=

x ∈ Rd : fi(x)≥ yi , i = 1, . . . ,k

. Now, define Y= [0,1]k and for all (x,y)∈

X×Y,

π∗(x,y) =C f0(x)

k

∏i=1

1[0, fi(x)](yi) = π(x)k

∏i=1

1[0, fi(x)](yi)

fi(x), y = (y1, . . . ,yk) .

It can be easily checked that the transition from (Xn,Yn) to (Xn+1,Yn+1) correspondsto a complete Gibbs sampler transition associated to the target density π∗ wherethe second component is updated according to π∗(y|x) = π∗(x,y)/

∫Y π∗(x,v)dv and

then, the first component is updated according to π∗(x|y) = π∗(x,y)/∫X π∗(u,y)du.

This implies that Xn, n ∈ N is a π-reversible Markov chain (see Lemma 1.59).Indeed, denoting by P the Markov kernel associated to Xn, n ∈ N, we obtain forall A ∈X ⊗X ,∫∫

1A(x,x′)π(dx)P(x,dx′) =∫∫∫

1A(x,x′)π(dx)π∗(x,y)

π(x)π∗(x′,y)∫

X π∗(u,y)dudydx′

=∫∫

1A(x,x′)(∫

π∗(x,y)π∗(x′,y)∫X π∗(u,y)du

dy)

dxdx′

=∫∫

1A(x′,x)π(dx)P(x,dx′)

where the last equality follows from the fact that the term between brackets is sym-metric with respect to (x,x′).

Page 56: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x
Page 57: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Chapter 2The strong Markov property and its applications

2.1 Stopping Times

In this section, we consider a filtered probability space (Ω ,F ,Fk, k ∈N,P) andan adapted process (Xn,Fn), n ∈ N. We set F∞ =

∨k∈NFk.

Definition 2.1 (Stopping times) A random variable τ from Ω to N = N∪∞ iscalled a stopping time if, for all k ∈ N, τ = k ∈ Fk. The family Fτ of eventsA ∈Fτ such that, for every k ∈ N, A∩τ = k ∈Fk, is called the σ -field of eventsprior to time τ .

Since τ = n= τ ≤ n\τ ≤ n−1, one can replace τ = n by τ ≤ n in thedefinition of the stopping time τ and in the definition of the σ -field Fτ . Moreover,by straightforward algebra, it can be easily checked that Fτ is indeed a σ -field. Itmay sometimes be useful to note that the constant random variables are also stop-ping times. In such a case, there exists some n ∈ N such that τ(ω) = n for everyω ∈Ω , and Fτ = Fn.

Definition 2.2 (Hitting times and return times) For A ∈X , the first hitting timeτA and return time σA of the set A by the process Xn, n∈N are defined respectivelyby

τA = infn≥ 0 : Xn ∈ A , (2.1)σA = infn≥ 1 : Xn ∈ A , (2.2)

where, by convention, inf /0 = +∞. The successive return times σ(n)A , n ≥ 0, are de-

fined inductively by σ(0)A = 0 and for all k ≥ 0,

σ(k+1)A = inf

n > σ

(k)A : Xn ∈ A

. (2.3)

It can be readily checked that return and hitting times are stopping times. For exam-ple,

45

Page 58: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

46 2 The strong Markov property and its applications

τA = n=n−1⋂k=0

Xk 6∈ A∩Xn ∈ A ∈Fn ,

so that τA is a stopping time.

Proposition 2.3 Let (Ω , F , Fk, k ∈N, P) be a filtered probability space and τ

and σ be two stopping times for the filtration (Fn)n≥0. Denote by Fτ and Fσ theσ -fields of the events prior to τ and σ , respectively. Then,

(i) τ ∧σ , τ ∨σ and τ +σ are stopping times,(ii) if τ ≤ σ , then Fτ ⊂Fσ ,

(iii) Fτ∧σ = Fτ ∩Fσ ,(iv) τ < σ ∈Fτ ∩Fσ , τ = σ ∈Fτ ∩Fσ .

Proof.(i) Let n∈N. We show that the events τ∧σ ≤ n, τ∨σ ≤ n and τ+σ ≤ n

belong to Fn. Since

τ ∧σ ≤ n= τ ≤ n∪σ ≤ n

and τ and σ are stopping times, τ ≤ n and σ ≤ n belong to Fn; thereforeτ ∧σ ≤ n ∈Fn. Similarly, τ ∨σ ≤ n= τ ≤ n∩σ ≤ n ∈Fn. Finally,

τ +σ ≤ n=n⋃

k=0

τ ≤ k∩σ ≤ n− k .

Now, for 0 ≤ k ≤ n, τ ≤ k ∈Fk ⊂Fn and σ ≤ n− k ∈Fn−k ⊂Fn; henceτ +σ ≤ n ∈Fn.

(ii) Let A ∈Fτ and n ∈ N. As σ ≤ n ⊂ τ ≤ n, A∩σ ≤ n = A∩τ ≤n∩ σ ≤ n. But A∩τ ≤ n ∈ Fn and σ ≤ n ∈ Fn (σ is a stopping time);therefore A∩τ ≤ n∩σ ≤ n ∈Fn and A∩σ ≤ n ∈Fn. Thus A ∈Fσ .

(iii) It follows from (i) and (ii) that Fτ∧σ ⊂Fτ ∩Fσ . Conversely, let A ∈Fτ ∩Fσ . Obviously A ⊂F∞. To prove that A ∈Fτ∧σ , one must show that, for everyk ≥ 0, A∩τ ∧σ ≤ k ∈Fk. We have A∩τ ≤ k ∈Fk and A∩σ ≤ k ∈Fk.Hence, since τ ∧σ ≤ k= τ ≤ k∪σ ≤ k, we get

A∩τ ∧σ ≤ k= A∩ (τ ≤ k∪σ ≤ k) = (A∩τ ≤ k)∪ (A∩σ ≤ k) ∈Fk .

(iv) Let n ∈ N. It holds that

τ < σ∩τ ≤ n=n⋃

k=0

τ = k∩σ > k .

But, for 0 ≤ k ≤ n, τ = k = τ ≤ k∩τ ≤ k−1c ∈Fk ⊂Fn and σ > k =σ ≤ kc ∈Fk ⊂Fn. Therefore, τ < σ∩τ ≤ n ∈Fn, showing that τ < σ ∈Fτ . Similarly,

Page 59: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.2 The Shift operator and the Markov property 47

τ < σ∩σ ≤ n=n⋃

k=0

σ = k∩τ < k

and since, for 0≤ k≤ n, σ = k ∈Fk ⊂Fn and τ < k= τ ≤ k−1 ∈Fk−1 ⊂Fn, it also holds τ < σ∩ σ ≤ n ∈Fn so that τ < σ ∈Fσ . Finally, τ <σ ∈Fτ ∩Fσ . The last statement of the proposition follows from

τ = σ= τ < σc∩σ < τc ∈Fτ ∩Fσ .

2

We want to define the position of the process Xn at time τ , i.e. Xτ(ω) =Xτ(ω)(ω). This quantity is not defined when τ(ω) = ∞. To handle this situation,we select an arbitrary F∞-measurable random variable X∞ and we set

Xτ = Xk on τ = k , k ∈ N .

Note that the random variable Xτ is Fτ -measurable since, for A ∈X and k ∈ N,

Xτ ∈ A∩τ = k= Xk ∈ A∩τ = k ∈Fk .

2.2 The Shift operator and the Markov property

Definition 2.4 (Shift operator) Let (X,X ) be a measurable space. The applica-tion θ : XN→ XN, defined by

θ : w = (w0,w1,w2, . . .) 7→ θ(w) = (w1,w2, . . .) ,

is called the shift operator.

Lemma 2.5 The shift operator θ is measurable with respect to X ⊗N.

Proof. Consider the cylinder H×XN, that is,

H×XN = ω ∈Ω : (ω0, . . . ,ωn−1) ∈ H .

Then,

θ−1(H×XN) = ω ∈Ω : (ω0, . . . ,ωn) ∈ X×H= X×H×XN ,

which is another cylinder and since the cylinders generate the basic σ -field, X ⊗N =σ(C0), where C0 is the semialgebra of cylinders. Therefore, the shift operator ismeasurable. 2

We define inductively θ0 as the identity function, i.e. θ0(w) = w for all w ∈ XN,and for k ≥ 1,

Page 60: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

48 2 The strong Markov property and its applications

θk = θk−1 θ .

Let Xk, k ∈ N be the coordinate process on XN, as defined in (1.20). Then, for( j,k) ∈ N2, it holds that

Xk θ j = X j+k .

Moreover, for all p,k ∈ N and A0, . . . ,Ap ∈X ,

θ−1k X0 ∈ A0, . . . ,Xp ∈ Ap= Xk ∈ A0, . . . ,Xk+p ∈ Ap ,

thus θk is measurable as a map from (XN,σ(X j, j ≥ k)) to (XN,F∞).Let τ be a stopping time. Define θτ on τ < ∞ by

θτ(w) = θτ(w)(w) . (2.4)

With this definition, we have Xτ = Xk on τ = k and Xk θτ = Xτ+k on τ < ∞.

Proposition 2.6 Let Fn, n∈N be the natural filtration of the coordinate processXn, n∈N on XN. Let τ and σ be two stopping times with respect to Fn, n∈N.

(i) For all integers n,m ∈ N2, θ−1n (Fm) = σ(Xn, . . . , Xn+m).

(ii) For every positive integer k, k+ τ θk is a stopping time.(iii) ρ = σ + τ θσ is a stopping time. If σ and τ are finite, then Xτ θσ = Xρ .

Proof.(i) For all A ∈X , and all integers k,n ∈ N2,

θ−1n Xk ∈ A= Xk θn ∈ A= Xk+n ∈ A .

Since the σ -field Fm is generated by the events of the form

B = X0 ∈ A0∩X1 ∈ A1∩ · · ·∩Xm ∈ Am ,

the σ -field θ−1n (Fm) is generated by the events

θ−1n (B) = θ

−1n X0 ∈ A0∩θ

−1n X1 ∈ A1∩ · · ·∩θ

−1n Xm ∈ Am

= Xn ∈ A0∩Xn+1 ∈ A1∩ · · ·∩Xn+m ∈ Am .

These events generate the σ -field σ(Xn, . . . , Xn+m)⊂Fn+m.(ii) Since τ is a stopping time, τ = m−k ∈Fm−k and by (i), it also holds that

θ−1k τ = m− k ∈Fm. Thus,

k+ τ θk = m= τ θk = m− k= θ−1k τ = m− k ∈Fm−k .

This proves that k+ τ θk is a stopping time.(iii) From the definition of ρ , we obtain

Page 61: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.2 The Shift operator and the Markov property 49

ρ = m= σ + τ θσ = m=m⋃

k=0

k+ τ θk = m, σ = k

=m⋃

k=0

k+ τ θk = m∩σ = k.

Since σ is a stopping time and we have just seen that k+τ θk is a stopping time foreach k, we obtain that ρ = m ∈Fm. Thus ρ is a stopping time. By construction,if τ(ω) and σ(ω) are finite, we have

Xτ θσ (ω) = Xτθσ(ω)(θσ (ω)) = Xσ+τθσ

(ω) .

2

Lemma 2.7 The successive hitting and return times to a measurable set A are stop-ping times with respect to the natural filtration of the process Xn, n ∈ N. In addi-tion, σ1 = 1+ τA θ1 and for n≥ 0, σ

(n+1)A = σ

(n)A +σA θ

σ(n)A

on σ (n)A < ∞.

Proof. The proof is a straightforward application of Proposition 2.6 (iii). 2

In what follows, a Markov kernel P on X×X is given, and we consider theprobability measure Pν on (XN,X ⊗N) (constructed according to Theorem 1.25)induced by a Markov chain of initial distribution ν ∈M1(X ) and transition kernelP. Moreover, the notation Eν stands for the associated expectation operator.

Proposition 2.8 (Markov property) For all F -measurable positive or boundedrandom variable Y , initial distribution ν ∈M1(X ) and k ∈ N, it holds that

Eν [Y θk|Fk] = EXk[Y ] Pν -a.s. (2.5)

Proof. We use the monotone class theorem stated in Theorem A.21. Let H be thevector space of bounded random variables Y such that (2.5) holds and let C be theset of cylinders. By the monotone convergence theorem, if Yn, n ∈ N is a nonde-creasing sequence of nonnegative random variables in H such that limn→∞ Yn = Yis bounded, then Y satisfies (2.5). We need to show that for any bounded measurablefunction f defined on Xk+1,

Eν [ f (X0, . . . ,Xk)Y θk] = Eν [ f (X0, . . . ,Xk)EXk(Y )] . (2.6)

By Theorem A.22, it suffices to check (2.6) with Y = g(X0, . . . ,X j), where j≥ 0 andg is any bounded measurable function on X j+1, i.e.

Eν [ f (X0, . . . ,Xk)g(Xk, . . . ,Xk+ j)] = Eν [ f (X0, . . . ,Xk)EXk [g(X0, . . . ,X j)]]

which follows easily from (1.15). 2

We illustrate the Markov property through a simple example.

Page 62: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

50 2 The strong Markov property and its applications

Example 2.9. Assume that there exists a set C ∈X such that, for all x /∈C, Px(σC <∞) = 1. Then, for all x ∈C, we have

Px(σC < ∞) = Px(X1 ∈C)+Px(X1 /∈C, σC θ1 < ∞)

= Px(X1 ∈C)+Ex[1X1 /∈CPX1(σC < ∞)]

= Px(X1 ∈C)+Px(X1 /∈C) = 1 .

Therefore, for all x ∈ X, Px(σC < ∞) = 1.

Proposition 2.10 (Strong Markov property) For all F -measurable positive orbounded random variables Y , initial distributions ν , and stopping times τ , it holdsthat

[Y θτ1τ<∞

∣∣Fτ

]= EXτ

[Y ]1τ<∞ Pν -a.s. (2.7)

Proof. We will show that, for all A ∈Fτ ,

Eν [1A Y θτ 1τ<∞] = Eν [1A EXτ[Y ]1τ<∞] . (2.8)

Since A∩τ = k ∈Fk, Proposition 2.8 yields

[1A∩τ=kY θτ

]= Eν

[1A∩τ=kY θk

]= Eν

[1A∩τ=kEXk [Y ]

]= Eν

[1A∩τ=kEXτ

[Y ]].

Equation (2.8) follows by noting that

Eν [1A Y θτ 1τ<∞] =∞

∑k=0

Eν [1A∩τ=k Y θk]

=∞

∑k=0

[1A∩τ=kEXτ

[Y ]]= Eν [1A1τ<∞EXτ

[Y ]] .

2

The expectation in the right-hand side of (2.7) must be understood in the fol-lowing way. Define the function g on X by g(x) = Ex[Y ]. Then g is measurableand EXτ

[Y ] = g(Xτ). It may happen that the function g is constant over the rangeof Xτ1τ<∞. This is obviously the case if Xτ1τ<∞ is constant. This happens forinstance if τ = τa, the hitting time of a state a ∈ X. If the state space is not discrete,it is possible that τa = ∞ Px-a.s. for all a ∈ X and all x 6= a. However, there mayexist certain sets such that the distribution of the chain starting from this state doesnot depend on the particular state its starts from. Such sets are called atoms.

Definition 2.11 (Atom) Let P be a Markov kernel on a measurable space (X,X ).A subset α ∈X is called an atom if there exist a probability measure ν on X suchthat P(x, ·) = ν for all x ∈ α . If α is an atom, then we denote P(α, ·), Pα , Eα

If τα < ∞ Pα -a.s., then the strong Markov property yields that the chain aftereach visit to the atom is independent of its past.

Page 63: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.2 The Shift operator and the Markov property 51

Proposition 2.12 Let α be an atom such that Pα(τα < ∞) = 1. Then, under Pα ,the sequence Xτα

,Xτα+1, . . . is independent of X0, . . . ,Xτα−1 and has distribu-

tion Pα .

Induced Markov chain

Let F ∈X . Assume that the chain is only observed when it visits the set F . Theresulting process Xn, n ∈ N may be obtained setting by Xn = X

σ(n)F

where σ(n)F

where σ(n)F is the n-th return time to the set F , as defined in Definition 2.2. As a con-

sequence of the strong Markov property, this process is a Markov chain, called theinduced chain. Let XF be the σ field restricted to F , i.e. XF = A∩F : A ∈X .

Proposition 2.13 Let P be a Markov kernel and F ∈ X . Assume that Px(σF <∞) = 1 for all x ∈ F.

(i) For all n ∈ N and x ∈ F, Px(σ(n)F < ∞) = 1.

(ii) For all x ∈ F, the process Xn, n ∈ N is a Markov chain under Px with kernelPF defined on F×XF by

PF(x,B) = Px(XσF ∈ B) , x ∈ F , B ∈XF .

(iii) Let A⊂ F and let σA be the return time to the set A by the chain Xn. Then, forall x ∈ F,

Ex[σA]≤ Ex[σA]supy∈F

Ey[σF ] .

Proof. For convenience purposes, we set for every n ∈ N, σ (n) = σ(n)F .

(i) The proof proceeds by induction on n ≥ 1. First note that by assumption,Px(σ

(1) < ∞) = 1. Now, assume that Px(σ(n) < ∞) = 1 for all x ∈ F . By the strong

Markov property, we get, for all x ∈ F ,

Px(σ(n+1) < ∞) = Px(σ

(n) < ∞, σ θσ (n) < ∞)

= Ex

[1σ (n)<∞PX

σ(n) (σ < ∞)]= Px(σ

(n) < ∞) = 1 .

(ii) Let x ∈ F . Since σ (n+1) = σ (n)+σ θσ (n) on σ (n) < ∞, Proposition 2.6-

(iii) shows that Xσ (n+1) = Xσ θ

σ (n) on σ (n) < ∞. Noting that by (i), Px(σ(n) <

∞) = 1, the strong Markov property applied to the Markov chain Xn yields, forany B ∈X ,

Px(

Xn+1 ∈ B∣∣F

σ (n)

)= Px

(X

σ (n+1) ∈ B∣∣F

σ (n)

)= Px

(Xσ θ

σ (n) ∈ B∣∣F

σ (n)

)= PX

σ(n) (Xσ ∈ B) = PF(Xn,B) .

Page 64: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

52 2 The strong Markov property and its applications

The proof follows.(iii) Eq. (2.3) implies

σA =σA−1

∑n=0σ (n+1)−σ

(n)=∞

∑n=0σ (n+1)−σ

(n)1n<σA =∞

∑n=0

σ θσ (n)1n<σA .

Let x ∈ F . Note that n < σA = ∩ni=1Xσ (i) /∈ A ∈F

σ (n) and applying again (i),we have Px(σ

(n) < ∞) = 1. We then obtain by the strong Markov property,

Ex[σA] =∞

∑n=0

Ex[σ θσ (n)1n<σA]

=∞

∑n=0

Ex[1n<σAEXσ(n) [σ ]]≤ Ex[σA]sup

y∈FEy[σ ] .

2

2.3 Potential kernel, harmonic and superharmonic functions

Let P be a Markov kernel on (X,X ) and let Xk, k ∈N be the associated canonicalMarkov chain as described in Definition 1.27.

Definition 2.14 (Occupation time, potential kernel, potential function) The oc-cupation time NA is the number of visits by Xk, k ∈ N to a set A ∈X , i.e.

NA =∞

∑k=0

1A(Xk) . (2.9)

For every x ∈ X and A ∈X , the expected number of visits of the chain Xk, k ∈ Nto a set A, starting at x is denoted

U(x,A) = Ex[NA] =∞

∑k=0

Pk(x,A) . (2.10)

The kernel U : (x,A) 7→U(x,A) is the potential kernel associated to P. A nonnegativefunction f ∈ F+(X,X ) is called a potential if there exists a function g ∈ F+(X,X )such that f =Ug.

Remark 2.15 For each x ∈ X, the function U(x, ·) defines a measure on X whichis not necessarily σ -finite.

Theorem 2.16 (Maximum principle). Let P be a Markov kernel on X×X .

(i) Let f ,g ∈ F+(X,X ) and a ≥ 0. If U f (x) ≤ Ug(x) + a for every x such thatf (x) > 0, then for every x ∈ X, U f (x) ≤ Ug(x)+ aPx(τA f < ∞) where A f = f > 0.

Page 65: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.3 Potential kernel, harmonic and superharmonic functions 53

(ii) For all x ∈ X and A ∈X , U(x,A)≤ Px(τA < ∞)supy∈A U(y,A).

Proof.(i) Let A f = f > 0. Then, for all x ∈ X,

U f (x) = Ex

[∞

∑n=0

f (Xn)

]= Ex

∑n≥τA f

f (Xn)1τA f <∞

=

∑n=0

Ex

[f (Xn θτA f

)1τA f <∞

].

Applying the strong Markov property yields

U f (x) =∞

∑n=0

Ex

[1τA f <∞EXτA f

[ f (Xn)]

]

= Ex

[1τA f <∞EXτA f

[∞

∑n=0

f (Xn)

]]= Ex

[1τA f <∞U f (XτA f

)]

≤ Ex

[1τA f <∞Ug(XτA f

)]+aPx(τA f < ∞)

= Ex

[1τA f <∞ ∑

n≥0g(Xn θτA f

)

]+aPx(τA f < ∞)

≤Ug(x)+aPx(τA f < ∞) .

This proves (i).(ii) Define f = 1A, g = 0 and a = supy∈A U(y,A). Then A = f > 0 and we

apply (i) to conclude.2

Proposition 2.17 Let A ∈X .

(i) If there exists δ ∈ (0,1) such that Px(σA < ∞)≤ δ for all x ∈ A, then U(x,A)≤(1−δ )−1 for all x ∈ X.

(ii) If Px(σA < ∞) = 1 for all x ∈ A, then Px(NA = ∞) = 1 for all x ∈ A.

Proof.(i) For p ∈ N, σ

(p+1)A = σ

(p)A +σA θ

σ(p)A

on σ (p)A < ∞. Applying the strong

Markov property yields

Px(σ(p+1)A < ∞) = Px(σ

(p)A < ∞, σA θ

σ(p)A

< ∞) = Ex[1σA<∞PXσ(p)A

(σA < ∞)]

≤ δPx(σ(p)A < ∞) .

By induction, we obtain Px(σ(p)A < ∞) ≤ δ p for every p ∈ N and x ∈ A. Thus, for

x ∈ A,

Page 66: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

54 2 The strong Markov property and its applications

U(x,A) = Ex[NA]≤ 1+∞

∑p=1

Px(σ(p)A < ∞)≤ (1−δ )−1 .

The result then follows from Theorem 2.16-(ii).(ii) By Proposition 2.13-(i), Px(σ

(n)A < ∞) = 1 for every n ∈ N and x ∈ A. Then,

Px(NA = ∞) = Px(∩∞n=1σ

(n)A < ∞) = 1.

2

Definition 2.18 (Harmonic and superharmonic functions) Let A ∈X and λ >0.

• A function f ∈ F+(X,X ) is called λ -superharmonic on A if P f (x)≤ λ f (x) forall x ∈ A.

• A function f ∈ F+(X,X )∪Fb(X,X ) is called λ -harmonic on A if P f (x) =λ f (x) for all x ∈ A.

If A = X, and the function f satisfies one of the previous conditions, it is simplycalled λ -superharmonic or λ -harmonic. In the case where λ = 1, the function f issaid to be superharmonic or harmonic.

Proposition 2.19 Let A ∈X . Then,

(i) a potential function f ∈ F+(X,X ) is superharmonic,(ii) the function x 7→ Px(τA < ∞) is harmonic on Ac,

(iii) the function x 7→ Px(σA < ∞) is superharmonic,(iv) the function x 7→ Px(NA = ∞) is harmonic.

Proof.(i) If f =Ug with g ∈ F+(X,X ), then P f = ∑

∞k=1 Pkg≤Ug = f .

(ii) Define f (x) = Px(τA < ∞) and note that P f (x) = Ex[ f (X1)] = Ex[PX1(τA <∞)]. Using the relation σA = 1+ τA θ and applying the Markov property, we get

P f (x) = Ex[Px (τA θ < ∞ |F1)] = Px(τA θ < ∞) = Px(σA < ∞) .

If x ∈ Ac, Px(σA < ∞) = Px(τA < ∞), hence P f (x) = f (x).(iii) Define g(x) = Px(σA < ∞). Along the same lines, obtain

Pg(x) = Ex[g(X1)] = Ex[PX1(σA < ∞)] = Px(σA θ < ∞) .

Since σA θ < ∞ ⊂ σA < ∞, the previous relation implies that Pg(x)≤ g(x) forall x ∈ X.

(iv) Define h(x) =Px(NA =∞). Then Ph(x) =Ex[h(X1)] =Ex[PX1(NA =∞)] andapplying the Markov property, we obtain

Ph(x) = Ex[Px (NA θ = ∞ |F1)] = Px(NA θ = ∞) = Px(NA = ∞) = h(x) .

2

Page 67: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.3 Potential kernel, harmonic and superharmonic functions 55

Proposition 2.20 If infx∈BPx(σA < ∞) = δ > 0, then NB = ∞ ⊂ NA = ∞,Pξ -a.s. for all ξ ∈M1(X ).

Proof. Set g(x) = Px(σA < ∞). By Proposition 2.19-(iii), the function g is superhar-monic and g(Xn), n ≥ 0 is a nonnegative supermartingale. By the supermartin-gale convergence theorem (Corollary D.19) there exists a random variable Z suchthat PXn(σA < ∞)→n Z, Pξ -a.s. for all ξ ∈M1(X ). By the bounded convergencetheorem, the convergence also holds in L1(Pξ ). We have, for F ∈Fp,

Eξ [1F Z] = limn→∞

[1FPξ (σA θn < ∞ |Fn)

]= lim

n→∞Pξ (F ∩σA θn < ∞) .

Since ∩∞n=1σA θn < ∞ = NA = ∞ and σA θn+1 < ∞ ⊂ σA θn < ∞, we

obtain by the monotone convergence theorem,

Eξ [1F Z] = Pξ (NA = ∞∩F) .

Since the above identity is satisfied for any integer p and any F ∈Fp, this impliesZ = 1NA=∞, Pξ -a.s. Note that, since infx∈BPx(σA < ∞) = δ , we have

NB = ∞ ⊂ PXn(σA < ∞)≥ δ i.o. .

Since limn→∞PXn(σA < ∞) = 1NA=∞, Pξ -a.s., the previous identity implies that1NA=∞ ≥ δ, Pξ -a.s., showing that NB = ∞ ⊂ NA = ∞, Pξ -a.s. 2

Corollary 2.21 Assume that infx∈XPx(σA < ∞) > 0. Then, for all x ∈ X, Px(NA =∞) = 1.

Proof. The proof follows by setting B = X in Proposition 2.20. 2

Proposition 2.22 Let A ∈X and λ > 0.

(i) A function f ∈ F+(X,X ) is λ -superharmonic on A if and only if for all ξ ∈M1(X ), λ−n∧τAc f (Xn∧τAc ),n ∈ N is a Pξ -supermartingale.

(ii) A function h ∈ F+(X,X )∪Fb(X,X ) is λ -harmonic on A if and only if for allξ ∈M1(X ), λ−n∧τAc h(Xn∧τAc ),n ∈ N is a Pξ -martingale .

Proof. For convenience, we set τ = τAc and Mn = λ−(n∧τ) f (Xn∧τ). Since τ is astopping time, Mτ1τ≤n is Fn-measurable. Assume first that f is λ -superharmonicon A. For ξ ∈M1(X ) we have, Pξ -a.s.,

Eξ[Mn+1|Fn]

= Eξ

[Mn+11τ≤n+1τ>n

∣∣Fn]

= λ−τ f (Xτ)1τ≤n+λ

−(n+1)1τ>nEξ

[ f (Xn+1)|Fn]

= λ−τ f (Xτ)1τ≤n+1τ>nλ

−(n+1)P f (Xn) .

Since, on τ > n, Xn ∈ A, we have P f (Xn) ≤ λ f (Xn) on τ > n, which impliesthat

Page 68: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

56 2 The strong Markov property and its applications

Eξ[Mn+1|Fn]≤ λ

−τ f (Xτ)1τ≤n+1τ>nλ−n f (Xn) = Mn .

Thus (Mn,Fn),n ∈ N is a Pξ -supermartingale.Conversely, assume that (Mn,Fn),n ∈ N is a Pξ -supermartingale. If x ∈ A,

then τ ≥ 1, Px-a.s. Thus, for all x ∈ A,

f (x)≥ Ex

[λ−(1∧τ) f (X1∧τ)

∣∣∣F0

]= λ

−1Ex [ f (X1)|F0] = λ−1 P f (x) .

The case of a λ -harmonic function is dealt with by replacing inequalities by equali-ties in the previous derivations. 2

Corollary 2.23 Let A∈X , V :X→ [1,∞) be a measurable function and λ ∈ (0,1).If PV ≤ λV on Ac, then Ex[λ

−τA ]≤V (x) for all x ∈ X.

Proof. By Proposition 2.22, Mn = λ−(n∧τA)V (Xn∧τA) is a positive Px-supermartingalefor all x ∈ X. Therefore

λ−nPx(τA = ∞) = Ex[λ

−(n∧τA)1τA=∞]

≤ Ex[λ−(n∧τA)V (Xn∧τA)]≤ Ex[M0] =V (x) .

This implies that Px(τA < ∞) = 1 for all x ∈ X. Hence, applying Fatou’s lemmayields

Ex[λ−τA ]≤ Ex[liminf

n→∞λ−(τA∧n)V (Xn∧τA)]

≤ liminfn→∞

Ex[λ−(τA∧n)V (XτA∧n)]≤V (x) .

2

2.4 The Dirichlet and Poisson problems

Definition 2.24 (β -Dirichlet Problem) Let A ∈X , β > 0, and f ∈ F+(X,X ). Anonnegative function u ∈ F+(X,X ) is a solution of the β -Dirichlet problem if

u(x) =

f (x) , x ∈ A ,

β Pu(x) , x ∈ Ac .(2.11)

For any set A ∈X and β > 0, we define a submarkovian kernel (see Definition 1.5)Pβ

A byPβ

A (x,B) = Ex[1τA<∞βτA 1B(XτA)] , x ∈ X , B ∈X . (2.12)

Equivalently, for f ∈ F+(X,X ),

A f (x) = Ex[1τA<∞βτA f (XτA)] . (2.13)

Page 69: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.4 The Dirichlet and Poisson problems 57

In particular, when β = 1, then for all x∈X, P1A(x,X)=Px(τA <∞) is the probability

that the chain starting from x eventually hits the set A.

Proposition 2.25 For any A∈X and f ∈F+(X,X ), the function Pβ

A f is a solutionof the β -Dirichlet problem.

Proof. If x ∈ A, then by definition, Pβ

A f (x) = f (x). For x ∈ X, the Markov propertyyields

P(Pβ

A f )(x) = Ex[Pβ

A f (X1)] = Ex[1τA<∞βτA f (XτA)θ1]

= Ex[1τAθ1<∞βτAθ1 f (X1+τAθ1)]

= β−1Ex[1σA<∞β

σA f (XσA)] ,

where we have used σA = 1+τA θ1. For x 6∈ A, then σA = τA Px-a.s. showing thatfor x 6∈ A,

βP(Pβ

A f )(x) = Ex[1τA<∞βτA f (XτA)] = Pβ

A f (x) .

The proof is completed. 2

For A ∈X , β > 0 and g ∈ F+(Ac,X |Ac) define

Ag(x) = 1Ac(x)Ex

[τA−1

∑k=0

βkg(Xk)

]. (2.14)

Note that Gβ

Ag is nonnegative but we do not assume that it is finite.

Proposition 2.26 Let A ∈X , β > 0 and g ∈ F+(Ac,XAc). Then the function Gβ

Agis a solution of

u(x) =

0 , x ∈ A ,

βPu(x)+g(x) , x ∈ Ac .(2.15)

Proof. Set u(x) = Gβ

Ag(x) =Ex[S] where S = 1Ac(X0)∑τA−1k=0 β kg(Xk). Note first that

u(x)= 0 for x∈A. Now, using the Markov property and the relation σA = 1+τAθ1,we obtain

Pu(x) = Ex[u(X1)] = Ex[EX1 [S]] = Ex[Ex[Sθ1 |F1]]

= Ex[Sθ1] = Ex

[1Ac(X1)

τAθ1

∑k=1

βk−1g(Xk)

]= Ex

[σA−1

∑k=1

βk−1g(Xk)

],

where the last equality follows from 1A(X1)∑σA−1k=1 β k−1g(Xk) = 0 Px-a.s. Using

that σA = τA Px-a.s. for all x /∈ A, we obtain for all x /∈ A,

Page 70: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

58 2 The strong Markov property and its applications

g(x)+βPu(x) = g(x)+Ex

[σA−1

∑k=1

βkg(Xk)

]

= Ex

[1Ac(X0)

τA−1

∑k=0

βkg(Xk)

]= u(x) .

2

Theorem 2.27. Given f ∈F+(A,XA) and g∈F+(Ac,XAc), u(x)=Pβ

A f (x)+Gβ

Ag(x)is a solution of

u(x) =

f (x) , x ∈ A,βPu(x)+g(x) , x ∈ Ac .

(2.16)

Furthermore if v ∈ F+(X,X ) satisfies

v(x)≥

f (x) , x ∈ A,βPv(x)+g(x) , x ∈ Ac,

(2.17)

then v(x)≥ u(x) for all x ∈ X.

Proof. (2.16) follows by combining Proposition 2.25 with Proposition 2.26. Now,assume that (2.17) holds. We set

Y0 = v(X0) , Yn = βnv(Xn)+1Ac(X0)

n−1

∑k=0

βkg(Xk) , n≥ 1 .

For all x ∈ X, we have Px-a.s.,

Ex [Y1∧τA |F0] = 1A(X0)Ex [Y0 |F0]+1Ac(X0)Ex [Y1 |F0]

= 1A(X0)v(X0)+1Ac(X0)βPv(X0)+g(X0)≤ 1A(X0)v(X0)+1Ac(X0)v(X0) = v(X0) = Y0∧τA ,

and, for n > 0,

Ex[Y(n+1)∧τA

∣∣Fn]= 1τA≤nYτA +1τA>n

β

n+1Ex [v(Xn+1) |Fn]+n

∑k=0

βkg(Xk)

= 1τA≤nYτA +1τA>n

β

nβPv(Xn)+g(Xn)+n−1

∑k=0

βkg(Xk)

≤ 1τA≤nYτA +1τA>n

β

nv(Xn)+n−1

∑k=0

βkg(Xk)

= 1τA≤nYτA +1τA>nYn = Yn∧τA .

Therefore Yn∧τA , n ∈ N is a nonnegative Px-supermartingale for all x ∈ X whichimplies

Page 71: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.4 The Dirichlet and Poisson problems 59

v(x) = Ex[Y0]≥ Ex[βn∧τA v(Xn∧τA)]+Ex

[1Ac(X0)

n∧τA−1

∑k=0

βkg(Xk)

].

Then, applying Fatou’s lemma, we obtain for all x ∈ X,

v(x)≥ liminfn→∞

Ex[βn∧τA v(Xn∧τA)]+ liminf

n→∞Ex[1Ac(X0)

n∧τA−1

∑k=0

βkg(Xk)]

≥ Ex[liminfn→∞

βn∧τA v(Xn∧τA)]+Ex[1Ac(X0)

τA−1

∑k=0

βkg(Xk)]

≥ Ex[1τA<∞βτAv(XτA)]+Gβ

Ag(x)

≥ Ex[1τA<∞βτA f (XτA)]+Gβ

Ag(x) = Pβ

A f (x)+Gβ

Ag(x) = u(x) .

2

We now state several useful consequences of Theorem 2.27.

Corollary 2.28 The function x 7→ Px(τA < ∞) is the smallest solution of the systemof inequations u(x)≥ Pu(x) for x 6∈ A and u(x)≥ 1 for x ∈ A.

Proof. Apply Theorem 2.27 with β = 1, f = 1A and g = 0. 2

Corollary 2.29 The function wA : x 7→ Ex[τA] is the smallest solution of the systemof inequations, Pu(x)≤ u(x)−1 for x 6∈ A and u(x)≥ 0 for x ∈ A.

Proof. We apply Theorem 2.27 with β = 1, f = 0 and g = 1Ac . In this case,

wA(x) = G1Ag(x) = 1Ac(x)Ex

[τA−1

∑k=0

1Ac(Xk)

]= 1Ac(x)Ex[τA] = Ex[τA] .

2

Corollary 2.30 Let g ∈ F+(X,X ). Then u =Ug is a solution of the equation u =Pu+g. If w ∈ F+(X,X ) satisfies the inequation

w≥ Pw+g , (2.18)

then u≤ w, i.e. u is the smallest solution of (2.18).

Proof. We apply Theorem 2.27 with A = /0 and β = 1. 2

Remark 2.31 • If g ∈ F+(X,X ) and Ug is finite, then PUg = Ug− g is alsofinite and Ug is a solution of the equation u−Pu= g which is called the Poissonequation for g.• If g ∈ F(X,X ) is such that U |g| < ∞, then Ug is a solution of the Poisson

equation for g.

Page 72: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

60 2 The strong Markov property and its applications

2.5 Riesz decomposition

Theorem 2.32 (Riesz decomposition). A finite (nonnegative) superharmonic func-tion f can be decomposed uniquely as f = h+Ug where h is an harmonic functionand Ug a potential. Furthermore h = limn→∞ Pn f and g = f −P f .

Proof. Since f is superharmonic, the function g = f −P f is nonnegative. This im-plies that Pn f , n ∈ N is a nonincreasing sequence of nonnegative functions andthus, h = limn→∞ Pn f is well-defined. Note that h is harmonic since, by the domi-nated convergence theorem (Pn f ≤ f for all n ∈ N and P f ≤ f < ∞),

Ph = P( limn→∞

Pn f ) = limn→∞

Pn+1 f = h .

Then, for all n ≥ 1, ∑n−1k=0 Pkg = f −Pn f . Taking the limit as n goes to infinity, we

obtain Ug = f −h.We now show that this decomposition is unique. Assume that f = h+Ug with h

harmonic and g ∈ F+(X,X ). Since h is harmonic, Pnh = h, which implies Pn f =h+∑

∞k=n Pkg. On the other hand, since Ug(x)< ∞, limn→∞ ∑

∞k=n Pkg(x) = 0, which

shows that h = limn→∞ Pn f = h. Finally, the relation Ug = Ug implies that g = gsince Ug = g+PUg. The proof is completed. 2

Proposition 2.33 For B ∈X , the Riesz decomposition of the superharmonic func-tion f (x)=Px(τB <∞) is given by h(x)=Px(NB =∞) and g(x)=1B(x)Px(σB =∞).

Proof. The Markov property implies

Pn f (x) = Ex[ f (Xn)] = Ex[PXn(τB < ∞)] = Px(τB θn < ∞) = Px

(⋃k≥n

Xk ∈ B

),

and the harmonic part of f in the Riesz decomposition is

h(x) = limn→∞

Pn f (x) = limn→∞

Px

(⋃k≥n

Xk ∈ B

)

= Px

(limsup

k→∞

Xk ∈ B)= Px(NB = ∞) .

Then f = h+Ug with

g(x) = f (x)−P f (x) = Px

(⋃k≥0

Xk ∈ B

)−Px

(⋃k≥1

Xk ∈ B

)= Px(X0 ∈ B , Xn /∈ B , n≥ 1) = 1B(x)Px(σB = ∞) .

2

Page 73: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.6 The Dynkin Formula 61

2.6 The Dynkin Formula

Let (Ω ,F ,Fn, n ∈ N,P) be a filtered probability space.

Proposition 2.34 (Dynkin’s formula for Markov chains) Let f ∈ Fb(X,X ) andP be a Markov kernel. Then, for all x ∈ X and all stopping time τ such that Ex[τ]<∞,

Ex[ f (Xτ)]− f (x) = Ex

[τ−1

∑k=0

(P− I) f (Xk)

].

Proposition 2.35 Let f ∈ F+(X,X ). Then, for all x ∈ X and all stopping times τ

such that Px(τ < ∞) = 1,

Ex[ f (Xτ)]+Ex

[τ−1

∑k=0

f (Xk)

]= f (x)+Ex

[τ−1

∑k=0

P f (Xk)

].

The proofs of Proposition 2.34 and Proposition 2.35 directly follow from Proposi-tion 2.36 and Corollary 2.37 below.

Proposition 2.36 (Dynkin’s formula) Let (Zn,Fn), n∈N be a bounded adaptedprocess and τ an integrable stopping time. Then

E [Zτ ]−E [Z0] = E

[τ−1

∑k=0

E [Zk+1−Zk |Fk]

].

Proof. We set U0 = 0, Un = Zn−Z0−∑n−1k=0E [Zk+1 |Fk]−Zk, n ≥ 1. We have

Un+1−Un = Zn+1−E [Zn+1 |Fn] showing that E [Un+1−Un |Fn] = 0. Therefore,(Un,Fn), n ∈ N is a martingale and E [Un∧τ ] = E [U0]. This implies

E [Zn∧τ ]−E [Z0] = E

[n∧τ−1

∑k=0E [Zk+1 |Fk]−Zk

].

We conclude by the Lebesgue theorem since Zn, n ∈ N is bounded and the stop-ping time τ is integrable. 2

Corollary 2.37 Let (Zn,Fn), n ∈ N be an adapted nonnegative process and τ astopping time such that P(τ < ∞) = 1. Then,

E [Zτ ]+E

[τ−1

∑k=0

Zk

]= E [Z0]+E

[τ−1

∑k=0

E [Zk+1 |Fk]

].

Proof. Applying Proposition 2.36 to the finite stopping time τ ∧n and the boundedprocess ZM

n , n ∈ N where ZMn = Zn∧M, we get

E[ZM

τ∧n]+E

[τ∧n−1

∑k=0

ZMk

]= E

[ZM

0]+E

[τ∧n−1

∑k=0

E[

ZMk+1∣∣Fk

]].

Page 74: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

62 2 The strong Markov property and its applications

Using the Lebesgue theorem and the monotone convergence theorem, we obtain bytaking the limit as n goes to infinity

E[ZM

τ

]+E

[τ−1

∑k=0

ZMk

]= E

[ZM

0]+E

[τ−1

∑k=0

E[

ZMk+1∣∣Fk

]].

We conclude by using again the monotone convergence theorem as M goes to infin-ity. 2

Theorem 2.38 (Comparison Theorem). Let Vn, n∈N, Yn, n∈N, and Zn, n∈N be three Fn, n ∈ N-adapted non-negative processes such that Yn < ∞ andZn < ∞ P-a.s. for every n ∈ N and

E [Vn+1 |Fn]≤Vn−Zn +Yn . (2.19)

Then, for all stopping time τ ,

E[Vτ1τ<∞

]+E

[τ−1

∑k=0

Zk

]≤ E [V0]+E

[τ−1

∑k=0

Yk

]. (2.20)

Proof. Set U0 =V0, Un =Vn +∑n−1k=0(Zk−Yk) and for all a > 0,

τa = inf

k ≥ 0 :

k

∑j=0

Yj > a

, a > 0 .

By (2.19), Un, n∈N is a supermartingale. Define Wk = a+Uk∧τa . Then, by Propo-sition D.6, Wk, k ∈ N is a nonnegative supermartingale. Therefore, for each fixedn≥ 0,

E [Uτ∧τn∧n] = E [Wτ∧τn∧n]−a≤ E [W0]−a = E [U0] .

This implies that

E[Vτ∧τn∧n1τ<∞

]+E

[τ∧τn∧n−1

∑k=0

Zk

]≤ E [Vτ∧τn∧n]+E

[τ∧τn∧n−1

∑k=0

Zk

]

≤ E [V0]+E

[τ∧τn∧n−1

∑k=0

Yk

].

Since τn∧n→∞, by Fatou’s lemma and the monotone convergence theorem, lettingn tend to infinity yields

Page 75: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.7 Examples 63

E[Vτ1τ<∞

]+E

[τ−1

∑k=0

Zk

]= E

[liminf

n→∞

Vτ∧τn∧n1τ<∞+

τ∧τn∧n−1

∑k=0

Zk

]

≤ E [V0]+ liminfn→∞

E

[τ∧τn∧n−1

∑k=0

Yk

]= E [V0]+E

[τ−1

∑k=0

Yk

].

2

Corollary 2.39 Assume that PV ≤ V − f + b1C for some V : X→ [0,∞] and f :X→ [0,∞), C ∈X and b < ∞. Then, for every x ∈ X,

Ex[V (XσC)1σC<∞]+Ex

[σC−1

∑k=0

f (Xk)

]≤V (x)+b1C(x) .

Proof. Apply Theorem 2.38 with Zk = f (Xk), Vk = V (Xk) and Yk = 1C(Xk) whichyields Ex

[∑

σC−1k=0 1C(Xk)

]= 1C(x) since Xk /∈C for k ∈ 1, . . . ,σC−1. 2

Corollary 2.40 Assume that there exists V : X→ [1,∞) and C ∈X such that

PV +1≤V +b1C , b ∈ R+ . (2.21)

Then, for all x ∈ X,

Ex[σC]≤V (x)+b1C(x) .

Proof. Set f ≡ 1 in Corollary 2.39. 2

2.7 Examples

2.7.1 The gambler’s ruin

Consider the simple symmetric random walk introduced in Section 1.9.1. LetZk, k ∈ N∗ be a sequence of i.i.d. random variables taking values in −1,1 withprobability P(Zk = 1) = P(Zk =−1) = 1/2. Denote by Xn the current wealth of thegambler, i.e.

Xn = X0 +Z1 +Z2 + · · ·+Zn .

where X0 is the gambler’s initial wealth. Assume that the gambler stops the gamewhen its wealth reaches either the upper barrier a or the lower barrier −b, a and bbeing positive integers. The gambler’s wealth is a Markov chain on the state spaceX= −b, . . . ,a. Let τ be the hitting time of the set −b,a, i.e. τ = infk≥ 0,Xk ∈−b,a. We want to compute the probability that the game ends in finite time.Define the function u on X by u(x) = Px(τ < ∞). Then u is harmonic on X\−b,a

Page 76: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

64 2 The strong Markov property and its applications

by Proposition 2.19-(ii). Thus, for x ∈ X\−b,a,

u(x) = Pu(x) =12

u(x−1)+12

u(x+1) . (2.22)

This implies that u(x+1)−u(x) = u(x−1)−u(x) and

u(x) = u(−b)+(x+b)u(−b+1)−u(−b) (2.23)

for all x ∈ X\−b,a. Since u(a) = u(−b) = 1, this yields u(−b+1) = u(−b) andthus u(x) = 1, i.e. Px(τ < ∞) = 1 for all x ∈ X. Therefore, the game ends in finitetime almost surely finite for any initial wealth x ∈ −b, . . . ,a.

We now compute the probability u(x) = Px(τa < τ−b) of winning. We can alsowrite u(x) = Ex[1a(Xτ)]. Theorem 2.27 (with β = 1 and f = 1a shows that u isthe smallest nonnegative solution of the equations

u(x) = Pu(x) , x ∈ X\−b,a ,u(−b) = 0 , u(a) = 1 .

We have established in (2.23) that the harmonic functions on X\−b,a are givenby u(x) = u(−b)+(x+b)u(−b+1)−u(−b). Since u(−b) = 0, this yields u(x) =(x+b)u(−b+1) for all x ∈ −b, . . . ,a. The boundary condition u(a) = 1 impliesthat u(−b+ 1) = 1/(a+ b). Therefore, the probability of winning when the initialwealth is x is equal to u(x) = (x+b)/(a+b).

We will now compute the expected time of a game. Denote by τ = τa ∧ τ−b bethe hitting time of the set −b,a. By Theorem 2.27, u(x) = Ex[τC] is the small-est solution of the modified Dirichlet problem (2.15). This yields the following re-currence equation (which differs from (2.22) by an additional constant term). Forx ∈ −b+1, . . . ,a−1,

u(x) = 12 u(x−1)+ 1

2 u(x+1)+1 . (2.24)

The boundary conditions are u(−b) = 0 and u(a) = 0. Define ∆u(x− 1) = u(x)−u(x−1) and ∆ 2u(x−1) = u(x+1)−2u(x)+u(x−1). Equation (2.24) implies thatfor x ∈ −b+1, . . . ,a−1,

∆2u(x−1) =−2 . (2.25)

The boundary conditions implies that the only solution of (2.25) is given by

u(x) = (a− x)(x+b) , x =−b, . . . ,a . (2.26)

2.7.2 Birth-and-death chain

.

Page 77: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.7 Examples 65

Consider the Birth and death chain introduced in Section 1.9.3. Let h(x) be theextinction probability starting from x, i.e. h(x) = Px(τ0 < ∞).

By Proposition 2.25, the function h is the smallest solution of the Dirichlet prob-lem (2.11) with f = 1 and A = 0. The equation Ph(x) = h(x) for x > 0 yields

h(x) = pxh(x+1)+qxh(x−1) .

Note that h is non increasing and define u(x) = h(x−1)−h(x). Then pxu(x+1) =qxu(x) and we obtain by induction that u(x+1) = γ(x)u(1) with γ(0) = 1 and

γ(x) =qxqx−1 . . .q1

px px−1 . . . p1.

This yields, for x≥ 1,

h(x) = h(0)−u(1)−·· ·−u(x) = 1−u(1)γ(0)+ · · ·+ γ(x−1) .

At this point u(1) remains to be defined. If ∑∞x=0 γ(x) = ∞, the restriction 0≤ h(x)≤

1 imposes u(1) = 0 and h(x) = 1 for all x. If ∑∞x=0 γ(x)<∞, we can choose any value

u(1)> 0 such that 1−u(1)∑∞x=0 γ(x)≥ 0. Therefore, the minimal non-negative so-

lution of the Dirichlet problem is obtained by setting u(1) = (∑∞x=0 γ(x))−1 which

yields the solution

h(x) =∑

∞y=x γ(y)

∑∞y=0 γ(y)

.

In this case, for x ∈ N∗, we have h(x) < 1, so the population survives with positiveprobability.

2.7.3 Simple random walk on Z

Consider again the simple random walk on Z, not necessary symmetric. That is aMarkov chain on Z with kernel P defined by P(x,x+1) = p and P(x,x−1) = q forall x ∈ Z, where p ∈ [0,1] and p+ q = 1. Let a < b ∈ Z and τ be the hitting timeof A = a+1, . . . ,b−1c. Our purpose is to compute the moments of τ . For x 6∈ A,Px(τ = 0) = 1. For x ∈ Ac we have

Px(τ ≤ (b−a))≥ Px(X1 = x+1, X2 = x+2, . . . , Xb−x = b)

≥ pb−x ≥ pb−a > 0 .

This implies that Px(τ > b−a)≤ 1−γ for all x ∈ Ac where γ = pb−a. Next, for anyx ∈ Ac and k ∈ N∗, the Markov property implies

Page 78: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

66 2 The strong Markov property and its applications

Px(τ > k(b−a)) = Px(τ > (k−1)(b−a), τ θ(k−1)(b−a) > (b−a))

= Ex[1τ>(k−1)(b−a)PX(k−1)(b−a)(τ > (b−a))]

≤ (1− γ)Px(τ > (k−1)(b−a)) ,

which by induction yields, for every x ∈ Ac,

Px(τ > k(b−a))≤ (1− γ)k .

For n ≥ (b− a), setting n = k(b− a)+ r, with r ∈ 0, . . . ,(b− a)− 1, we get forany x ∈ Ac,

Px(τ > n)≤ Px(τ > k(b−a))≤ (1− γ)k ≤ (1− γ)(n−(b−a))/(b−a) .

This tail bound implies that Ex[τs]<∞ for any s> 0. Let g be a nonnegative function

of Ac. Consider the system of equationsu(x) = g(x)+Pu(x) , a < x < b ,

u(a) = α , u(b) = β .(2.27)

Proposition 2.26 shows that u1(x) = Ex[τA] is the minimal solution of (2.27) withg(x) = 1Ac(x), α = 0 and β = 0. For s = 2 and every x ∈ Ac, we have

u2(x) = Ex[σ2] = Ex[(1+ τ θ)2]

= 1+2Ex[τ θ ]+Ex[τ2 θ ]

= 1+2Ex[E [τ θ |F1]]+Ex[E[

τ2 θ

∣∣F1]]

= 1+2Ex[EX1 [τ]]+Ex[EX1 [τ2]]

= 1+2Pu1(x)+Pu2(x) .

Therefore, u2 is the finite solution of the system (2.27) with g(x) = 1+2Pu1(x) forx ∈ Ac, u2(a) = α et u2(b) = α . Similarly, for x ∈ Ac it holds that,

u3(x) = Ex[τ3] = Ex[(1+ τ θ1)

3]

= 1+3Ex[τ θ1]+3Ex[τ2 θ1]+Ex[τ

3 θ1]

= 1+3Pu1(x)+3Pu2(x)+Pu3(x)

which implies that u3 is the finite solution of the system (2.27) with g(x) = 1+3Pu1(x)+3Pu2(x) for x ∈ Ac, α = 0 and β = 0. All the moments of the hitting timeτ can thus be explicitly computed inductively.

We will finally show that the system of equations (2.27) has a unique finite solu-tion on a, . . . ,b. For x ∈ Ac, the equation u(x)−Pu(x) = g(x) is equivalent to

u(x+1)−u(x) = ρ[u(x)−u(x−1)]− p−1g(x) . (2.28)

Set ∆u(x+1) = u(x+1)−u(x). Then, (2.28) yields, for x = a+1, . . . ,b,

Page 79: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

2.7 Examples 67

∆u(x) = ρx−a−1

∆u(a+1)− p−1x−a−1

∑y=0

ρyg(x− y−1) , (2.29)

and a solution u of (2.28) is uniquely determined by u(a) and u(a + 1). Let uscompute this unique solution, denoted by φ , in the case where φ(a + 1) = 1,φ(a) = 0 and g(x) = 0 for every x ∈ −a+ 1, . . . ,b. Applying (2.29), we obtain,for x ∈ a+1, . . . ,b, φ(x)−φ(x−1) = ρx−a+1, which implies

φ(x) =x

∑y=a+1

ρy−a+1 =

(1−ρx−a)/(1−ρ) if ρ 6= 1,x−a otherwise.

We next find the unique solution ψ of (2.28) such that ψ(a) = ψ(a+1) = 0 for anarbitrary function g. Equation (2.29) becomes, for x ∈ a+1, . . . ,b,

∆ψ(x) =−p−1x−a−1

∑y=0

ρyg(x− y−1) ,

and this yields

ψ(x) =−p−1x

∑z=a+1

z−a−2

∑y=0

ρyg(x− y−1) . (2.30)

We can now find the unique solution of (2.27) for any function g and any initialconditions. Set

w = α + γφ +ψ (2.31)

with γ = φ(b)−1(β − α −ψ(b)) (which is well-defined since φ(b) > 0). Byconstruction, w(a) = α , w(x) = Pw(x) + g(x) for all x ∈ a + 1, . . . ,b− 1 andw(b) = α + γφ(b)+ψ(b) = β .

2.7.4 Monte Carlo methods for the Dirichlet problem

The Laplace equation comes up in a wide variety of physical context (fluid me-chanic, electromagnetism, etc...) and is among the most important of all partial dif-ferential equations. The Dirichlet problem can be formulated as follows. Consider aconnex bounded domain D⊂Rd . Denote by ∂D the boundary of this domain. Givena function f defined on ∂D, we attempt to find a solution of

∆u(r) =d

∑k=1

∂2k u(r) = 0 and for all r ∈ ∂D, u(r) = f (r) (2.32)

where ∂ 2k denote the partial derivative w.r.t. the k-th coordinate of the function. This

problem may serve to model the distribution of heat in a plate connected to heatsources located on its boundary. The plate is represented by the domain D, u(r) is

Page 80: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

68 2 The strong Markov property and its applications

the temperature at r and the temperature on the plate boundary is f (r). There areseveral ways to solve this equation. We will consider here a Monte Carlo method.

The first step consists in discretizing the domain into boxes with size 1/L. Wedenote by DL the discretized domain,

DL =

L−1(i1, i2, . . . , id) ∈ D : (i1, i2, . . . , id) ∈ Zd= D∩ (1/L)Zd .

The discrete boundary ∂DL is defined by the subset of points in Dc∩L−1Zd whosedistance to D is lower than 1/L. If the function u is sufficiently regular, the Taylorexpansion, for x ∈ DL we get

u(x+L−1ek)−u(x) = (1/L)∂ku(x)+(1/2L2)∂ 2k u(x)+O(L−3)

u(x−L−1ek)−u(x) =−(1/L)∂ku(x)+(1/2L2)∂ 2k u(x)+O(L−3)

where ek is the k-th canonical vector in Rd . By summing these two equations we getan approximation of the second order k-th derivative ∂ 2

k u(x) as the L→ ∞

∂2k u(x) = L2 [u(x+L−1ek)−u(x)+u(x−L−1ek)−u(x)

]+O(L−1) .

This leads to the definition for a function v : DL∪∂DL→R of the discrete Laplacianat any x ∈ DL

∆ Lv(x) = ∑y∼xv(y)− v(x)

where y ∼ x means that the sum extends to every y ∈ DL ∪ ∂DL which are at adistance of lower than L to x. The Dirichlet problem (2.32) can be approximated bythe discrete Dirichlet problem

for all x ∈ DL, ∆ Lu(x) = 0 and for all y ∈ ∂DL, u(y) = fL(y) . (2.33)

where the Dirichlet constraint f on ∂D is replaced by the discretized constraint fLon ∂DL.

The solution of the discrete Dirichlet problem can be obtained by using a sym-metric random walk on L−1Zd with transition kernel P(x,x± L−1ek) = 1/(2d)(the random walk jumps on each of its neighbouring point with equal probabil-ity). For any x ∈DL and any function v, Pv(x) = (1/2d)∑y∼x v(y) and the condition∇Lv(x) = 0 is therefore equivalent to Pv(x) = v(x). Hence, a function which is suchthat, for x ∈ DL, ∇Lv(x) = 0 is an harmonic function v on DL.

Denote by τ = τ∂DL the hitting time of the discrete boundary ∂DL of the randomwalk. On the event τ < ∞, Xτ is the site of the discrete boundary ∂DL wherethe random walk has exited the domain. By Proposition 2.25, for every x ∈ DL,the function u(x) = Ex[1τ<∞ fL(Xτ)] is a solution of u(x) = fL(x) on ∂DL andPu(x) = u(x) on DL, which is equivalent to the discrete Dirichlet problem.

Page 81: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x
Page 82: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Chapter 5Limit theorems for stationary ergodic Markovchains

5.1 Dynamical systems

5.1.1 Definitions

Definition 5.1 (Dynamical system) Let (Ω ,B,P) be a probability space. An ap-plication T : Ω → Ω is called a measure preserving transformation if it is B-measurable and if for all A ∈B,

P(T−1(A)) = P(A) .

In that case, the probability P is said to be invariant under the transformation T.(Ω ,B,P,T) is said to be a dynamical system. The application T is said to be aninvertible measure-preserving transformation if it is measure-preserving, invertibleand its inverse T−1 is measurable.

In the latter case, T−1 is also measure preserving since

P(T(A)) = P(T−1T(A)) = P(A)

for all A ∈B. Note also that if T is measure-preserving, then for any integer n ∈ N,and A ∈B,

P(T−n(A)) = P(A) .

If (Ω ,B) is the canonical space (XN,X ⊗N) associated to a measurable space(X,X ), let θ be the shift operator as in Definition 2.4: for ω = (ωk,k ∈ N),

θ(ω0,ω1, . . .) = (ω1,ω2, . . .) .

Then θ is measurable by Lemma 2.5, but it is not invertible. Let Xn, n ∈N be thecoordinate process. Then, Xk T = Xk+1 for all k ≥ 0. The next Lemma shows theconnection between invariance under P and stationarity of the coordinate process.

129

Page 83: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

130 5 Limit theorems for stationary ergodic Markov chains

Lemma 5.2 Let (X,X ) be a measurable space, let (XN,X ⊗N) be the associatedcanonical space and Xn, n ∈ N be the coordinate process. Let P be a probabilitymeasure on (XN,X ⊗N) and let θ be the shift operator. Then P is invariant for θ

if and only if the coordinate process is stationary under P.

Proof. Recall that Xn, n ∈ N is stationary if the distribution of (Xk, . . . ,Xk+n) isindependent of k for all n ∈ N. Since (Xk, . . . ,Xk+n) = (X0, . . . ,Xn) T k, the invari-ance of P implies that the distribution of (Xk, . . . ,Xk+n) is independent of k. Con-versely, assume that Xn, n ∈ N is stationary. For H ∈X n, define the cylinder setA = H×XN. Then,

P(θ−1(A)) = P(

ω ∈ XN : (ω1, . . . ,ωn+1) ∈ H)

= P((X1, . . . ,Xn+1) ∈ H)

= P((X0, . . . ,Xn) ∈ H) = P(A) .

It follows by Corollary A.20 that the one-sided shift operator θ preserves P. 2

Example 5.3 (One-sided Markov shift). Let P be a kernel on (X,X ) which ad-mits an invariant probability π on (X,X ). By Theorem 1.25, there exists a uniqueprobability measure Pπ on the canonical space such that the coordinate process isa Markov chain with kernel P and initial distribution π . Then, by Theorem 1.30,the canonical chain is a stationary process. By Lemma 5.2, this means that Pπ is θ

invariant, i.e. Pπ θ−1 = Pπ .

5.1.2 Invariant events

Definition 5.4 (Invariant random variable, invariant event) Let T : (Ω ,B)→ (Ω ,B)be a measurable map.

(i) A random variable Y defined on (Ω ,B) is invariant for T if Y T = Y .(ii) An event A is invariant is invariant for T if A = T−1(A) or equivalently if its

indicator function 1A is invariant for T .

Proposition 5.5 Let T : (Ω ,B)→ (Ω ,B) be a measurable map.

(i) The collection I of invariant sets for is a sub-σ -field of B.(ii) Let (E,E ) be a measurable space such that singletons are measurable. Let Y :

(Ω ,B)→ (E,E ) be a measurable map. Then, Y is invariant for T if and onlyif Y is I -measurable.

Proof. The proof of ((i)) is elementary and omitted. Consider now ((ii)). If Y :(Ω ,B)→ (E,E ) is invariant, then, for B ∈ E ,

T−1(Y−1(B)) = (Y T)−1(B) = Y−1(B) .

Page 84: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.1 Dynamical systems 131

Thus Y−1(B) ∈I and Y is I -measurable. Conversely, if Y is I -measurable, then(Y T)−1(B) = T−1(Y−1(B)) =Y−1(B) for all B ∈ E . Since singletons are assumedto be measurable, we can apply this relation with B = x for x ∈ E. If Y (ω) = x,then ω ∈ Y−1(x) = (Y T )−1(x), hence Y T (ω) = x = Y (ω), i.e. Y is invariantfor T . 2

The most important example of invariant random variables which we will dealwith are limits of sequences or of partial sums.

Lemma 5.6 Let (XN,X ⊗N) be the canonical space, Xn, n ∈ N the coordinateprocess and θ the shift operator. Then I = ∩k≥0σ(Xl , l > k) and for any realvalued measurable function f defined on X, limsupn→∞ f (Xn), liminfn→∞ f (Xn),limsupn→∞ n−1( f (X0)+ · · ·+ f (Xn−1)) and liminfn→∞ n−1( f (X0)+ . . .+ f (Xn−1))are invariant random variables.

Proof. Set Gk = σ(Xl , l ≥ k) and G∞ = ∩k≥0Gk. Let Y be a G∞-measurable map.This means that for all k ≥ 0, there exists a measurable map Gk such that

Y (ω) = Gk(Xk(ω),Xk+1(ω), . . .) = Gk(ωk,ωk+1, . . .) .

This yields in particular that for all ω ∈Ω and k ≥ 1,

G0(ω0,ω1, . . .) = Gk(ωk,ωk+1, . . .) .

This means that G0 does not depend on ω0, . . . ,ωk−1. This implies that G0 = Gk forall k ≥ 1, or equivalently,

Y = G0(ωk,ωk+1, . . .) = G0 θk(ω) = Y θ

k .

That is, Y is invariant. Conversely, if A ∈ Gk, then θ−1(A) ∈ Gk+1. Therefore, if A isinvariant, then A = θ−1(A) ∈ G1. It can be proved inductively that A ∈ Gk for all kand thus A ∈ ∩k≥0Gk.

The remaining statements of the lemma are straightforward consequences sincefor instance, limsupk→∞ f (Xk) is G∞-measurable, hence invariant. 2

Let now (Ω ,B,P,T) be a dynamical system, that is, T is measure preserving forP. A random variable Y defined on Ω is said to be P-a.s. invariant for T if Y T =Y ,P-a.s. Similarly, an event A is P-a.s. invariant for T if its indicator function 1A isP-a.s. invariant.

Lemma 5.7 If Y is P-a.s. invariant, then there exists an invariant random variableZ, such that Y = Z P-a.s.

Proof. The random variable Z = limsupn→∞ Y Tn is invariant. Since Y is P-a.s.invariant, Y = Y T P-a.s., hence Y = Y Tn P-a.s. for all n≥ 1. This yields Y = ZP-a.s. 2

Proposition 5.8 Let (Ω ,B,P,T) be a dynamical system. If Y is a real valued ran-dom variable such that E [Y+]< ∞ then, for all k ≥ 0,

Page 85: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

132 5 Limit theorems for stationary ergodic Markov chains

E[

Y Tk∣∣∣I ]= E [Y |I ] P-a.s.

Proof. Let A ∈I . Since T is measure-preserving, and 1A Tk = 1A, we have

E[1A Y Tk

]= E

[1A Tk Y Tk

]= E [1AY ] = E [1AE [Y |I ]] .

This implies that E[Y Tk

∣∣I ]= E [Y |I ] P-a.s. 2

The behavior of time averages as is given by the following fundamental result.

Theorem 5.9 (Birkhoff’s ergodic theorem). Let (Ω ,B,P,T) be a dynamical sys-tem and Y be a real valued random variable such that E [Y+]< ∞. Then,

limn→∞

1n

n−1

∑k=0

Y Tk = E [Y |I ] P-a.s. (5.1)

Furthermore, if E [|Y |]< ∞, the convergence also holds in L1(P).

Definition 5.10 (Ergodicity) A dynamical system (Ω ,B,P,T) is ergodic if the in-variant σ -field I is trivial for P, i.e. for all A ∈I , P(A) ∈ 0,1.

Corollary 5.11 Let (Ω ,B,P,T) be an ergodic dynamical system and Y be a ran-dom variable such that E [Y+] < ∞ or E [Y−] < ∞. Then, the limit in (5.1) is equalto E [Y ].

5.2 Markov chains ergodicity

We now specialize the results of the previous section in the context of Markovchains. Throughout the following sections, we consider a Markov kernel P on ameasurable space (X,X ) and the coordinate process Xk, k ∈ N on the canonicalspace (Ω ,F ) = (XN,X ⊗N), endowed with the family of probability measures Pξ ,ξ ∈M1(X ) under which the coordinate process is a Markov chain with kernel Pand initial distribution ξ .

Definition 5.12 (Ergodic probability measure) A probability measure π ∈M1(X )is P-ergodic if it is invariant for P and if the dynamical system (Ω ,F ,Pπ ,θ) is er-godic.

Since the dynamical system (Ω ,F ,Pπ ,θ) is ergodic, Birkhoff’s ergodic theoremCorollary 5.11 can be applied.

Proposition 5.13 Let π be a P-ergodic probability measure. Then, for all Y ∈L1(Pπ),

limn→∞

1n

n−1

∑k=0

Y θk = Eπ [Y ] Pπ -a.s.

Page 86: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.2 Markov chains ergodicity 133

We have seen in Section 2.3 that harmonic functions play an important role. Wenow relate harmonic functions and invariant random variables.

Proposition 5.14 (i) Let Y be a bounded invariant random variable. Then, thefunction hY : x 7→ hY (x) = Ex[Y ] is a bounded harmonic function.

(ii) Let h be a bounded harmonic function and define Y = limsupn→∞ h(Xn). ThenY is an invariant random variable and for any ξ ∈ M1(X ), the sequenceh(Xn),n ∈ N converges Pξ -a.s. and in L1(Ω ,F ,Pξ ) to Y . Moreover, h(x) =Ex[Y ] for all x ∈ X.

(iii) Let π be an invariant probability and Y ∈L1(Ω ,F ,Pπ) be an invariant randomvariable. Then, Ex[|Y |]< ∞ π-a.e. , the function x 7→ Ex[Y ] is π-integrable andY = EX0 [Y ] Pπ -a.s.

Proof.(i) By the Markov property, for any x ∈ X,

PhY (x) = Ex[hY (X1)] = Ex[EX1 [Y ]] = Ex[Y θ1] = Ex[Y ] = hY (x) ,

showing that hY is harmonic.(ii) Let h be a bounded harmonic function. Then, (h(Xn),Fn),n ∈ N is a

bounded Pξ -martingale, for any initial distribution ξ ∈M1(X ). By Doob’s mar-tingale convergence theorem, the sequence h(Xn),n ∈N converges Pξ -a.s. and inL1(Ω ,F ,Pξ ) to a limit, which is necessarily equal to Y Pξ -a.s. Since h is harmonic,we have h(x) = Pnh(x) =Ex[h(Xn)] for all x∈X and n∈N. Since h is bounded, tak-ing the limit as n→∞ an applying the bounded convergence theorem yields Ex[Y ] =limn→∞Ex[h(Xn)] = h(x). This proves that Ex[Y ] = Ex[limsupn→∞ h(Xn)] = h(x) forall x ∈ X.

(iii) Since Y ∈ L1(Ω ,F ,Pπ), Eπ [|Y |] =∫X π(dx)Ex[|Y |] showing that Ex[|Y |]<

∞ π-a.e. and that the function x 7→ Ex[Y ] is integrable w.r.t. π . By the Markov prop-erty and the invariance of Y , we get

EXk [Y ] = Eπ [Y θk|Fk] = Eπ [Y |Fk] , Pπ -a.s.

Therefore, (EXk [Y ],Fk),k∈N is a uniformly integrable Pπ -martingale. By Doob’smartingale convergence theorem (see Corollary D.25),

limk→∞

EXk [Y ] = limk→∞

Eπ [Y |Fk] = Eπ [Y |F ] = Y Pπ -a.s. (5.2)

and in L1(Ω ,F ,Pπ). Moreover, applying successively that Pπ and Y are θ -invariant,we obtain for any k ∈ N,

Eπ [|Y −EX0 [Y ]|] = Eπ [|Y −EX0 [Y ]| θk]

= Eπ [|Y θk−EXk [Y ]|] = Eπ [|Y −EXk [Y ]|] .

Taking the limit as k goes to infinity, (5.2) yields

Eπ [|Y −EX0 [Y ]|] = limk→∞

Eπ [|Y −EXk [Y ]|] = 0 .

Page 87: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

134 5 Limit theorems for stationary ergodic Markov chains

2

Remark 5.15 Proposition 5.14 shows that the mapping Y 7→ hY , where hY (x) =Ex[Y ], x ∈ X defines a one-to-one correspondence between the bounded harmonicfunctions and the bounded invariant random variable. If Y is a bounded invariantrandom variable, then hY : x 7→ hY (x) = Ex[Y ] is a bounded harmonic function. If his a bounded harmonic function, then h(x) = Ex[Y ] where Y = limsupn→∞ h(Xn) isan invariant random variable (hence h = hY ).

The following proposition shows that if a Markov kernel P has an invariant dis-tribution and if the invariant σ -field is not trivial for Pπ (i.e. there exists a set A ∈Iwith probability Pπ(A) 6∈ 0,1), then we may construct two mutually singular in-variant distributions π1 and π2.

Proposition 5.16 Let π be a P-invariant probability measure. Assume that thereexists an invariant event A ∈I with α = Pπ(A) /∈ 0,1. Then, there exists B ∈Xsuch that π(B) = α and the probability measures πB and πBc defined by

πB(·) = α−1

π(B∩·) , πBc(·) = (1−α)−1π(Bc∩·) ,

are invariant for P and

PπB (Xk ∈ B, for all k ≥ 0) = PπBc (Xk ∈ Bc, for all k ≥ 0) = 1 .

Proof. The random variable Y = 1A is invariant; therefore, by Proposition 5.14-(iii), 1A = PX0(A), Pπ -a.s. and, setting B = x ∈ X, Px(A) = 1, we get B ∈X and1A = 1B(X0), Pπ -a.s. Since A ∈I , the previous identity implies

1A = 1B(X0) = · · ·= 1B(Xk) = · · ·=∞

∏k=0

1B(Xk) Pπ -a.s.

Therefore, for all C ∈X , we get

PπB(X1 ∈C) = α−1Pπ(X1 ∈C∩X0 ∈ B) = α

−1Pπ(X1 ∈C∩X1 ∈ B)= α

−1Pπ(X1 ∈C∩B) = α−1

π(C∩B) = πB(C) ,

showing that πB is an invariant probability. Finally

PπB(Xk ∈ B, for allk ≥ 0) = α−1Pπ(Xk ∈ B, for allk ≥ 0) = α

−1Pπ(X0 ∈ B) = 1 .

The same result holds for πBc by replacing B by Bc. 2

This proposition has an important consequence. If a Markov kernel P has aunique invariant distribution π , then necessarily the invariant σ -field is trivial forPπ and hence the dynamical system (Ω ,F ,Pπ ,θ) is ergodic. We state this propertyin the following corollary.

Corollary 5.17 If the Markov kernel P admits a unique invariant probability π ,then π is P-ergodic.

Page 88: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.2 Markov chains ergodicity 135

Proposition 5.18 Let π be a probability measure such that

limn→∞

1n

n−1

∑k=0

f (Xk) = π( f ) Pπ -a.s.

for all f ∈ Fb(X,X ). Then π is ergodic.

Proof. Since f is bounded, the convergence also holds on L1(Ω ,F ,Pπ). Then,since by the Markov property Eπ [P f (Xk)] = Eπ [ f (Xk+1)],

π(P f ) = Eπ

[lim

n

1n

n−1

∑k=0

P f (Xk)

]

= limn→∞

1n

n−1

∑k=0

Eπ [P f (Xk)] = Eπ

[lim

n

1n

n

∑k=1

f (Xk)

]= π( f ) ,

we obtain that π is an invariant probability. Let A ∈I . Proposition 5.14-(iii) showstat 1A = PX0(A) = 1B(X0) Pπ -a.s. where B = x ∈ X : Px(A) = 1. Since for anyk ∈ N 1A = 1B(X0) = · · ·= 1B(Xk) Pπ -a.s., we obtain

1B(X0) =1n

n−1

∑k=0

1B(Xk)Pπ-a.s.−→ π(B) .

Therefore, π(B) = Pπ(A) = 0 or 1, i.e. the invariant σ -field is trivial for Pπ and thusπ is ergodic. 2

Theorem 5.19 (Birkhoff Theorem for Markov chains). Let P be a Markov ker-nel on (X,X ) and π be an invariant probability. Let Y ∈ L1(Ω ,F ,Pπ) and letEπ [Y |I ] be a version of the conditional expectation of Y given I under Pπ . Then,there exists a set S ∈X (which depends on the chosen version), such that π(S) = 1and for each x ∈ S,

limn→∞

1n

n−1

∑k=0

Y θk = Ex [Eπ [Y |I ]] Px-a.s. (5.3)

Proof. Let Z be a version of Eπ [Y |I ] and define φ(x) = Ex[Z]. It follows fromProposition 5.14 (iii) that Eπ [Y |I ] = φ(X0), Pπ -a.s. Hence, Theorem 5.9 yields

limn→∞

1n

n−1

∑k=0

Y θk = Eπ [Y |I ] = φ(X0) , Pπ -a.s.

Set A =

limn→∞1n ∑

n−1k=0 Y θk = φ(X0)

. The previous relation implies Pπ(A) = 1,

i.e.∫

π(dx)Px(A) = 1. Since Px(A)≤ 1 for all x ∈ X, this implies that Px(A) = 1 forπ-almost all x ∈ X. Setting S = x ∈ X,Px(A) = 1 concludes the proof. 2

Corollary 5.20 Let P be a Markov kernel on (X,X ) and let π be a P-ergodicprobability. Let Y ∈ L1(Ω ,F ,Pπ). Then, for π-almost all x ∈ X,

Page 89: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

136 5 Limit theorems for stationary ergodic Markov chains

limn→∞

1n

n−1

∑k=0

Y θk = Eπ [Y ] Px-a.s.

Proof. Since π is ergodic, the invariant σ -field I is trivial for Pπ . This impliesEπ [Y |I ] = Eπ [Y ] Pπ -a.s. 2

Corollary 5.21 Let P a Markov kernel on (X,X ). Any two different ergodic prob-ability measures π1 and π2 are singular.

Proof. Note first that, if π is an ergodic probability measure and f ∈ Fb(X,X ),then applying Corollary 5.20 to the random variable Y = f (X0) and the dominatedconvergence theorem, we obtain that there exists a set S ∈X , such that π(S) = 1and for all x ∈ S,

limn→∞

1n

n−1

∑k=0

Pk f (x) = limn→∞

1n

n−1

∑k=0

Ex[ f (Xk)] = π( f ) .

Now assume that π1 and π2 are two ergodic probabilities with π1(C) 6= π2(C) forsome C ∈X and set, for i = 1,2,

Si =

x ∈ X : lim

n→∞

1n

n−1

∑k=0

Pk1C(x) = πi(C)

,

We have S1 ∩ S2 = /0, π1(S1) = 1 and π2(S2) = 1, which means that π1 and π2 aresingular. 2

5.3 Central Limit Theorems for Additive Functionals

Let π be a an invariant probability measure for the Markov kernel P. Denote byL2

0(π) the space of functions f ∈ L2(π) such that π( f ) = 0. Let Xn, n ∈ N be thecanonical Markov chain on X with kernel P. Given f ∈ L2

0(π), consider the partialsum

Sn( f ) =n−1

∑k=0

f (Xk) . (5.4)

The purpose of this Section is to prove a central limit theorem for n−1/2Sn( f ). Themain tool will be the Poisson equation.

Definition 5.22 Assume that P admits a unique invariant distribution π . For f ∈F(X,X ) such that π(| f |)< ∞, the equation

f −P f = f −π( f ) , (5.5)

Page 90: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.3 Central Limit Theorems for Additive Functionals 137

is called the Poisson equation associated to the function f . Any f ∈ F(X,X ) satis-fying P| f |(x) < ∞ for all x ∈ X and such that (5.5) holds is called a solution of thePoisson equation associated to f .

Provided that a solution of the Poisson equation exists, we can link the quantitySn( f ) to a certain martingale.

Lemma 5.23 Let f ∈ F(X,X ) be a function satisfying π(| f |) < ∞. Assume thatthe Poisson equation (5.5) admits a solution f . Then,

Sn( f )−nπ( f ) = Mn( f )+ f (X0)− f (Xn) ,

where

Mn( f ) =n

∑k=1

f (Xk)−P f (Xk−1)

is a Pξ -martingale for all ξ ∈M1(X ).

Proof. The proof is a straightforward consequence of the Markov property, i.e. thefact that for any function g such that π(|g|) < ∞, Pg(x) = Ex[g(X1)] π-a.s. Thismeans that g(Xk)−Pg(Xk−1),k ≥ 1 is a martingale difference sequence. 2

Denote by π1 the joint distribution of (X0,X1) under Pπ , i.e. π1 = π⊗P. Denoteby ‖·‖L2(π) and ‖·‖L2(π1)

the norms of L2(π) and L2(π1). We will use the followingresults which are direct consequences of the central limit theorem for stationaryergodic martingales, Theorem D.32.

Lemma 5.24 Let π be a P-ergodic probability measure. Let G ∈ L2(π1). Assumethat, for all j ≥ 1, E

[G(X j−1,X j)

∣∣F j−1]= 0, Pπ -a.s. Then,

n−1/2n

∑k=1

G(Xk−1,Xk)Pπ=⇒ N(0,‖G‖2

L2(π1)) .

If g ∈ L2(π) and G(x,y) = g(x)−Pg(y), then

n−1/2n

∑k=1g(Xk)−Pg(Xk−1)

Pπ=⇒ N(0,σ2π (g)) ,

where σπ(g) is defined by

σ2π (g) = Eπ [g(X1)−Pg(X0)2] . (5.6)

5.3.1 CLT when a solution to the Poisson equation exists

Our first result is a central limit theorem for additive functionals in the case where asolution to the Poisson equation exists.

Page 91: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

138 5 Limit theorems for stationary ergodic Markov chains

Theorem 5.25. Let π be a P-ergodic probability measure. Let f ∈ L20(π). Assume

that there exists a solution f ∈ L2(π) to the Poisson equation f −P f = f . Then,

n−1/2n−1

∑k=0

f (Xk)Pπ=⇒ N(0,σ2

π ( f )) ,

with σ2π ( f ) as in (5.6).

Proof. By Lemma 5.23, we have Sn( f ) = Mn( f ) +Rn( f ) where Mn is a mar-

tingale which satisfies the assumptions of Lemma 5.24. Thus, n−1/2Mn( f )Pπ=⇒

N(0,σ2π ( f )). Moreover, Rn( f ) = f (X0)− f (Xn) and Eπ [R2

n( f )]≤ 2∥∥ f∥∥2

L2(π)so that

limn→∞ n−1Eπ [R2n( f )] = 0. 2

5.3.2 Another CLT

We have seen above that a central limit theorem for the additive functional followsfrom a central limit theorem for martingales provided that there exists a solution inL2(π) of the Poisson equation. We will again use a martingale decomposition butwe will replace the Poisson equation by the solution the resolvent equation, definedfor f ∈ L2(π) and λ > 0 by

(1+λ ) fλ −P fλ = f . (5.7)

Contrary to the Poisson equation, the resolvent equation always has a solution fλ inL2(π) because (1+λ )I−P is invertible for all λ > 0. This solution is given by

fλ = (1+λ )−1∞

∑j=0

(1+λ )− jP j f . (5.8)

By Proposition 1.31, P is a contraction in L2(π) thus

∥∥ fλ

∥∥L2(π)

≤ (1+λ )−1∞

∑j=0

(1+λ )− j ∥∥P j f∥∥

L2(π)(5.9)

≤ (1+λ )−1∞

∑j=0

(1+λ )− j ‖ f‖L2(π) = λ−1 ‖ f‖L2(π) . (5.10)

For all f ∈ L2(π) and n≥ 1, define the partial sum

Vn f :=n−1

∑k=0

Pk f . (5.11)

Page 92: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.3 Central Limit Theorems for Additive Functionals 139

By Proposition 1.31, P is a contraction in L2(π), so that Vn is a bounded linearoperator on L2(π) for each n. The main result of this section is Theorem 5.31 whichstates that if

∑n=1

n−3/2 ‖Vn f‖L2(π) < ∞ , (5.12)

thenσ

2( f ) = limn→∞

n−1Eπ [S2n( f )] (5.13)

exists and is finite, and n−1/2Sn( f )Pπ=⇒ N(0,σ2( f )) as n→ ∞.

Remark 5.26 (Comments on Condition (5.12))• Assume first that there is a solution f to Poisson’s equation f = f +P f , where

f ∈ L2(π) , then

Vn f =n−1

∑k=0

Pk f =n−1

∑k=0

Pk (I−P) f = f −Pn f

and, therefore, ‖Vn f‖L2(π) ≤ 2∥∥ f∥∥

L2(π)and the series (5.12) converges.

• If (5.13) holds, then, by Jensen’s inequality,

‖Vn f‖2L2(π) = Eπ [

(EX0 [Sn( f )]

)2]≤ Eπ [S2

n( f )] = O(n) .

Thus, ‖Vn f‖L2(π) = O(√

n) and we can say that Condition (5.12) is within alogarithmic term of being necessary.

We now give the martingale decomposition which we will use. Define

Mn(λ ) =n

∑j=1 fλ (X j)−P fλ (X j−1) , (5.14)

Rn(λ ) = fλ (X0)− fλ (Xn) . (5.15)

Lemma 5.27 For each fixed λ > 0 and all n≥ 1,

Sn( f ) = Mn(λ )+Rn(λ )+λSn( fλ ) . (5.16)

Moreover, Mn(λ ),n≥ 0 is a Pπ -martingale,

n−1/2Mn(λ )Pπ=⇒ N(0,σ2

π ( fλ ))

andEπ [R2

n(λ )]≤ 4∥∥ fλ

∥∥2L2(π)

. (5.17)

Proof. Since fλ is the solution of the resolvent equation, we have,

Page 93: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

140 5 Limit theorems for stationary ergodic Markov chains

Sn( f ) =n−1

∑k=0(1+λ ) fλ (Xk)−P fλ (Xk)

=n−1

∑j=0 fλ (X j)−P fλ (X j)+λSn( fλ )

=n

∑j=1 fλ (X j)−P fλ (X j−1)+ fλ (X0)− fλ (Xn)+λSn( fλ )

= Mn(λ )+Rn(λ )+λSn( fλ ) .

This proves (5.16). The central limit theorem for Mn(λ ) follows from Lemma 5.24.The bound (5.17) is straightforward since the distribution of Xn is π . 2

We now state and prove a central limit theorem under two ad-hoc assumptionson fλ . Define, for (x0,x1) ∈ X2,

Hλ (x0,x1) = fλ (x1)−P fλ (x0) . (5.18)

Recall that π1 = π⊗P denotes the distribution of (X0,X1).

Theorem 5.28. Let π be a P-ergodic probability measure. Assume that there exist afunction H ∈ L2(π1) and a sequence λk, k ∈ N such that

0 < liminfk→∞

kλk ≤ limsupk→∞

kλk < ∞ , (5.19a)

limk→∞

√λk∥∥ fλk

∥∥L2(π)

= 0 , (5.19b)

limk→∞

∥∥Hλk−H

∥∥L2(π1)

= 0 . (5.19c)

Then, n−1/2Sn( f )Pπ=⇒ N(0,‖H‖2

L2(π1)).

Proof. Since π1(Hλ ) = 0 and Hλkconverges to H in L2(π1), we have π1(H) = 0.

Since∫

P(x0,dx1)Hλ (x0,x1) = 0, we have

∫π(dx0)

[∫P(x0,dx1)H(x0,x1)

]2

=∫

π(dx0)

∫P(x0,dx1)[H(x0,x1)−Hλn(x0,x1)]

2

≤∫

π1(dx0,dx1)[H(x0,x1)−Hλn(x0,x1)]2 =

∥∥Hλk−H

∥∥L2(π1)

.

By assumption (5.19c), this proves that∫

P(x0,dx1)H(x0,x1) = 0, π-a.e. HenceE[

H(X j,X j+1)∣∣F j

]= 0, Pπ -a.s. For n≥ 1, set Mn =∑

nj=1 H(X j−1,X j). Then Mn

is a martingale and by Lemma 5.24, we have

n−1/2MnPπ=⇒ N(0,‖H‖2

L2(π1)) . (5.20)

Page 94: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.3 Central Limit Theorems for Additive Functionals 141

For each fixed n, since Eπ [Mn(λk)−Mn2] = n∥∥Hλk

−H∥∥2

L2(π1), Condition (5.19c)

implies thatlimk→∞

Eπ [Mn(λk)−Mn2] = 0 . (5.21)

Next, Condition (5.19b) implies that, for any n,

limk→∞

λkEπ [S2n( fλk

)] = 0 . (5.22)

Using the decomposition (5.16), (5.21) and (5.22) shows that, for j,k > 0,

Eπ [(Rn(λ j)−Rn(λk))2]

≤ 2Eπ [(Mn(λ j)−Mn(λk))2]+4λ

2j Eπ [S2

n( fλ j)]+4λ2k Eπ [S2

n( fλk)] .

Therefore, for any fixed n, Rn(λk),k ∈N is a Cauchy sequence in L2(π) and thereexists a random variable Rn ∈ L2(π) such that

limk→∞

Eπ [Rn(λk)−Rn2] = 0 . (5.23)

Therefore, letting λ → 0 along the subsequence λk in the decomposition (5.16)yields

Sn( f ) = Mn +Rn , Pπ -a.s. (5.24)

It remains to show that Eπ [R2n] = o(n) as n→∞. Applying the decompositions (5.16)

and (5.24) and the conditions (5.19), we obtain

Eπ [R2n] = Eπ [(Mn(λn)−Mn +λnSn( fλn)+Rn(λn))

2]

≤ 3Eπ [Mn(λn)−Mn2]+3λ2n Eπ [Sn( fλn)

2]+3Eπ [Rn(λn)2]

≤ 3n∥∥Hλn −H

∥∥2L2(π1)

+

(nλn +

4nλn

)λn∥∥ fλn

∥∥2L2(π)

= o(n) .

2

It remains to show that Condition (5.12) implies the conditions (5.19b) and(5.19c). This is the purpose of Lemma 5.29 and Lemma 5.30.

Lemma 5.29 Assume that (5.12) holds and define δk = 2−k. Then

∑k=0

√δk∥∥ fδk

∥∥L2(π)

< ∞ .

Proof. Applying summation by parts, we have

fδ =∞

∑k=1

Pk−1 f(1+δ )k = δ

∑n=1

Vn f(1+δ )n+1 .

Using this and the Minkovski inequality yields

Page 95: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

142 5 Limit theorems for stationary ergodic Markov chains∥∥ fδk

∥∥L2(π)

≤ δk

∑n=1

(1+δk)−n−1 ‖Vn f‖L2(π) .

This implies, by changing the order of summation,

∑k=0

√δk∥∥ fδk

∥∥L2(π)

≤∞

∑n=1

[∞

∑k=0

δ3/2k

(1+δk)n+1

]‖Vn f‖L2(π) .

The quantity between brackets is equal to ∑∞k=1(δk − δk−1)hn(δk) with hn(x) =√

x/(1+ x)n+1. Since hn is nonincreasing, the series is upper bounded by∫ 1

0hn(x)dx≤

∫ 1

0

√x

1+ xe−nx/2dx≤ n−3/2

∫∞

0

√ue−u/2du = O(n−3/2) .

2

Lemma 5.30 Assume that (5.12) holds and set δk = 2−k. Then there exists a func-tion H ∈ L2(π1) such that (5.19c) holds.

Proof. For ν a measure on an arbitrary measurable space, let 〈·, ·〉L2(ν) denote thescalar product of the space L2(ν). Since fλ is a solution of the resolvent equation,we have P fλ = (1+λ ) fλ − f and thus, for λ ,µ > 0,

〈Hλ ,Hµ〉L2(π1)= 〈 fλ , fµ〉L2(π)−〈P fλ ,P fµ〉L2(π)

=−(λ +µ +λ µ)〈 fλ , fµ〉L2(π)

+(1+λ )〈 fλ , f 〉L2(π)+(1+µ)〈 fµ , f 〉L2(π)−‖ f‖2L2(π)

.

This yields, applking the Cauchy-Schwarz inequality,∥∥Hλ −Hµ

∥∥2L2(π1)

= ‖Hλ‖2L2(π1)

−2〈Hλ ,Hµ〉L2(π1)+∥∥Hµ

∥∥2L2(π1)

=−(2λ +λ2)∥∥ fλ

∥∥2L2(π)

+2(λ +µ +λ µ)〈 fλ , fµ〉L2(π)− (2µ +µ2)∥∥ fµ

∥∥2L2(π)

≤ (λ +µ)∥∥ fλ

∥∥2L2(π)

+∥∥ fµ

∥∥2L2(π)

.

Applying this bound with λ = δk and µ = δk−1 and Lemma 5.29 yields

∑k=1

∥∥Hδk−Hδk−1

∥∥L2(π1)

≤√

3∞

∑k=1

√δk∥∥ fδk

∥∥2L2(π)

+√

3/2∞

∑k=1

√δk−1

∥∥ fδk−1

∥∥L2(π)

≤ (√

3+√

3/2)∞

∑k=0

√δk∥∥ fδk

∥∥L2(π)

< ∞ .

This proves the convergence of the sequence Hδk,k ∈ N in L2(π1). 2

Page 96: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.3 Central Limit Theorems for Additive Functionals 143

We now have all the necessary ingredients to state the main result of this section.

Theorem 5.31. Let π be a P-ergodic probability measure. Let f ∈ L20(π) be a

function such that (5.12) holds. Then, n−1/2Sn( f )Pπ=⇒ N(0,σ2( f )) where σ2( f ) =

limn→∞ n−1Eπ [S2n( f )].

Proof. Let kn be the unique integer such that 2kn−1 ≤ n < 2kn and define λn = 2−kn

for n ≥ 1. Then 1/2 ≤ nλn ≤ 1, i.e. (5.19a) holds. Moreover, λk,k ∈ N∗ ⊂2−k,k ∈ N∗. Thus, Lemmas 5.30 and 5.30 yield (5.19b) and (5.19c) and Theo-rem 5.28 applies. 2

We now give sufficient conditions for (5.12) which may be easier to check onexamples.

Lemma 5.32 Consider the following conditions for f ∈ L20(π).

∑k=1

1√k

∥∥∥Pk f∥∥∥

L2(π)< ∞ , (5.25)

∑k=1

log1+δ (k)∥∥∥Pk f

∥∥∥2

L2(π)< ∞ , (5.26)

for some δ > 0. Then (5.26) implies (5.25) which implies (5.12).

Proof. By the Cauchy-Schwarz inequality,

∑k=1

k−1/2∥∥∥Pk f

∥∥∥L2(π)

(∞

∑k=1

k−1 log+−(1+δ )(k)

)1/2(∞

∑k=1

log+ (1+δ )(k)∥∥∥Pk f

∥∥∥2

L2(π)

)1/2

.

Thus (5.26) implies (5.25). Next,

∑n=1

n−3/2

∥∥∥∥∥n−1

∑k=0

Pk f

∥∥∥∥∥L2(π)

≤∞

∑n=1

n−3/2n−1

∑k=0

∥∥∥Pk f∥∥∥

L2(π)

=∞

∑k=0

∥∥∥Pk f∥∥∥

L2(π)

∑n=k+1

n−3/2

≤C∞

∑k=0

k−1/2∥∥∥Pk f

∥∥∥L2(π)

.

Thus (5.25) implies (5.12). 2

Lemma 5.33 For f ∈ L20(π), if

∑k=1

∥∥∥Pk f∥∥∥

L2(π)< ∞ . (5.27)

then Condition (5.12) holds and

Page 97: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

144 5 Limit theorems for stationary ergodic Markov chains

σ2( f ) = ‖ f‖2

L2(π)+2∞

∑k=1〈 f ,Pk f 〉L2(π) (5.28)

≤ ‖ f‖2L2(π)+2‖ f‖L2(π)

∑k=1

∥∥∥Pk f∥∥∥

L2(π).

Proof. Since π is invariant for the kernel P and π( f ) = 0,

n−1Eπ

(n−1

∑k=0 f (Xk)

)2= n−1

n−1

∑i, j=0

Eπ [ f (Xi) f (X j)]

= n−1n−1

∑i, j=0〈 f ,P|i− j| f 〉L2(π) = ‖ f‖2

L2(π)+2n−1

∑k=1

(1− k

n

)〈 f ,Pk f 〉L2(π) .

If (5.27) holds, then the Cauchy-Schwarz inequality yields

∑k=1

∣∣∣〈 f ,Pk f 〉L2(π)

∣∣∣≤ ∥∥∥Pk f∥∥∥

L2(π)

∑k=1

∥∥∥Pk f∥∥∥

L2(π)< ∞ .

This yields (5.28) by the dominated convergence theorem. 2

5.3.3 A CLT for all ξ s

Finally, we give a simple criterion to prove the central limit theorem when the initialdistribution is not the invariant probability measure. This criterion is related to con-vergence in total variation distance, which will be developed in Chapter 6. Definethe total variation distance between two probability measures µ,ν ∈M1(X ) by

‖µ−ν‖TV = suph∈Fb(X,X ),|h|∞≤1

|µ(h)−ν(h)| .

Theorem 5.34. Let π be a P-invariant probability measure f ∈ L20(π). Assume that

there exists σ2 such that n−1/2∑

n−1k=0 f (Xk−1)

Pπ=⇒N(0,σ2). Let ξ ∈M1(X ) be suchthat

limn→∞‖ξ Pn−π‖TV = 0 . (5.29)

Then n−1/2∑

n−1k=0 f (Xk)

=⇒ N(0,σ2).

Proof. Set Sn = ∑n−1k=0 f (Xk) and fix x ∈ X and m ∈ N∗. For n > m and t ∈ R,

Page 98: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

5.4 Examples 145

∆n(t) =∣∣∣Eξ

[exp(

itn−1/2Sn

)]−Eπ

[exp(

itn−1/2Sn

)]∣∣∣≤∣∣∣Eξ

[EXm

[exp(itn−1/2Sn−m)

]]−Eπ

[EXm

[exp(itn−1/2Sn−m)

]]∣∣∣+ rm,n ,

with

rm,n ≤ Eξ

[∣∣∣exp(

itn−1/2Sm

)−1∣∣∣]+Eπ

[∣∣∣exp(

itn−1/2Sm

)−1∣∣∣]

By the bounded convergence theorem, for each m ∈ N, limn→∞ rm,n = 0. Moreover,by definition of the total variation distance,∣∣∣Eξ

[EXm

[exp(itn−1/2Sn−m)

]]−Eπ

[EXm

[exp(itn−1/2Sn−m)

]]∣∣∣=

∣∣∣∣∫XEy

[eitn−1/2Sn−m

]ξ Pm(dy)−

∫XEy

[eitn−1/2Sn−m

]π(dy)

∣∣∣∣≤ ‖ξ Pm−π‖TV .

Appplying Assumption (5.29), this yields

limsupn→∞

∆n(t)≤ limm→∞‖ξ Pm−π‖TV = 0 .

2

5.4 Examples

Example 5.35 (Bernoulli shift). Let εn, n ∈ N be i.i.d. random variables takingthe values 0 and 1 with probability 1/2 each and define

Xn =12(Xn−1 + εn) , n≥ 1 .

Then Xn, n ∈ N, is a Markov chain with values in [0,1] and kernel P definedby P(x,x/2) = 1/2 = P(x,(1 + x)/2) for x ∈ [0,1]. The unique, hence er-godic, invariant distribution is Lebesgue’s measure on [0,1]. Set Dk = j2−k : j =0, . . . , 2k − 1. Let f be a square integrable function f defined on [0,1] suchthat

∫π

0 f (x)dx = 0. Since Lebesgue’s measure is invariant for π , it also holds that∫ 10 Pk f (x)dx = 0 for all k ≥ 1. Therefore,

Pk f (x) = 2−k∑

z∈Dk

f( x

2k + z)

= 2−k∑

z∈Dk

∫ 1

0

[f( x

2k + z)− f

( y2k + z

)]dy .

Page 99: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

146 5 Limit theorems for stationary ergodic Markov chains

Let ‖·‖2 denote the L2 norm with respect to Lebesgue’s measure on [0,1]. Then, theprevious identity yields∥∥∥Pk f

∥∥∥2

2≤ 2−k

∑z∈Dk

∫ 1

0

∫ 1

0

[f( x

2k + z)− f

( y2k + z

)]2dydx

≤ 2k∫ ∫

|x−y|≤2−k[ f (x)− f (y)]2dxdy .

Fix δ > 0 and define, for 0 < z < 1,

J(z) = ∑k:2−k≥z

2k log1+δ (k) .

Then

∑k=1

log1+δ (k)∥∥∥Pk f

∥∥∥2

2≤∫ 1

0

∫ 1

0J(|x− y|) f (x)− f (y)2dxdy ,

Moreover, there exists a constant C > 0 such that 0≤ J(z)≤Cz−1 log1+δ [log(1/z)].Therefore, if∫ 1

0

∫ 1

0[ f (y)− f (x)]2|x− y|−1 log1+δ [− log(|x− y|)]dxdy < ∞ ,

for some δ > 0, then (5.26) holds. This is a very weak smoothness assumption,which holds if f is a Lipschitz function on [0,1].

Page 100: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Chapter 6Uniformly ergodic Markov chains

We discuss in this chapter one of the strongest form of convergence of Markovchain, namely uniform ergodicity. Many Markov chains turn out not to be uniformlyergodic, but the theory of uniform ergodicity is straightforward and provides an easyintroduction to the more sophisticated conditions implying more general geometricergodicity conditions.

6.1 Total Variation Distance

Let (X,X ) be a measurable space and ξ be a finite signed measure on (X,X ).Then, by the Jordan decomposition theorem (Theorem A.14), there exists a uniquepair of positive finite measures ξ+, ξ− on (X,X ) such that ξ+ and ξ− are singularand ξ = ξ+−ξ−. The couple (ξ+,ξ−) is referred to as the Jordan decomposition ofthe signed measure ξ . The finite measure |ξ |= ξ++ξ− is called the total variationof ξ . It is the smallest measure ν such that, for all A ∈X , ν(A)≥ |ξ (A)|. Any set Ssuch that ξ+(Sc) = ξ−(S) = 0 is called a Jordan set for ξ . A Jordan set is essentiallyunique in the following sense: if S and S′ are Jordan sets, |ξ |(S∆S′) = 0. We havefor all A ∈X ,

ξ+(A) = ξ (A∩S), ξ−(A) =−ξ (A∩Sc) .

Then, for A ∈X ,

ξ (A) = ξ (A∩S)−ξ (A∩Sc) = |ξ |(A∩S)−|ξ |(A∩Sc) =∫

A(1S−1Sc)d|ξ | .

Thus ξ = (1S−1Sc) · |ξ | and this yields the following result.

Proposition 6.1 A set function ξ is a signed measure if and only if there exist µ ∈M+(X ) and h ∈ L1(µ) such that ξ = h · µ . Then, S = h ≥ 0 is a Jordan set forξ , ξ+ = h+ ·µ , ξ− = h− ·µ , and |ξ |= |h| ·µ .

147

Page 101: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

148 6 Uniformly ergodic Markov chains

Definition 6.2 (Total variation distance) Let ξ be a finite signed measure on (X,X )with Jordan decomposition (ξ+,ξ−). The total variation norm of ξ is defined by

‖ξ‖TV = |ξ |(X) .

The total variation distance between two probability measures ξ ,ξ ′ ∈ M1(X) isdefined by

dTV(ξ ,ξ′) =

12

∥∥ξ −ξ′∥∥

TV .

Note that dTV(ξ ,ξ′) = (ξ −ξ ′)(S) where S is a Jordan set for ξ −ξ ′. Let M0(X )

be the set of finite signed measures ξ such that ξ (X) = 0. We now give equivalentcharacterizations of the total variation norm for signed measures. Let the oscillationosc ( f ) of a bounded function f be defined by

osc ( f ) = supx,x′∈X

| f (x)− f (x′)|= 2 infc∈R| f − c|∞ .

Theorem 6.3. For any ξ ∈Ms(X ),

‖ξ‖TV = supξ ( f ) : f ∈ Fb(X,X ), | f |∞ ≤ 1 . (6.1)

If moreover ξ ∈M0(X ), then

‖ξ‖TV = 2supξ ( f ) : f ∈ Fb(X,X ), osc ( f )≤ 1 . (6.2)

Proof. By Proposition 6.1, ξ = h · µ with h ∈ L1(µ) and µ ∈M+(X ). The proofof (6.1) follows from the identity

‖ξ‖TV =∫X|h|dµ =

∫X1h>0−1h<0hdµ = sup

| f |≤1

∫f hdµ .

Now, let ξ ∈M0(X ). Then, ξ ( f ) = ξ ( f + c) for all c ∈ R and thus, for all c ∈ R,

|ξ ( f )|= |ξ ( f − c)| ≤ ‖ξ‖TV | f − c|∞ .

Since this inequality if valid for all c ∈ R, this yields

|ξ ( f )| ≤ ‖ξ‖TV infc∈R| f − c|∞ =

12‖ξ‖TV osc ( f ) . (6.3)

Conversely, if we set f = (1/2)(1S − 1Sc) where S is a Jordan set for ξ , thenosc ( f ) = 1 and

ξ ( f ) =12ξ+(S)+ξ−(Sc)= 1

2ξ+(X)+ξ−(X)=

12‖ξ‖TV .

Combining with (6.3) proves (6.2). 2

Page 102: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.1 Total Variation Distance 149

Corollary 6.4 If ξ ,ξ ′ ∈M1(X ), then ξ−ξ ′ ∈M0(X ) and for any f ∈Fb(X,X ),

|ξ ( f )−ξ′( f )| ≤ dTV(ξ ,ξ

′) osc ( f ) . (6.4)

Proposition 6.5 If (X,X ) is a metric space, the convergence in total variation ofa sequence of probability measures on (X,X ) implies its weak convergence.

Proof. By Theorem 6.3, convergence in total variation implies that limn→∞ ξn(h) =ξ (h) for all bounded measurable function h. This is a stronger property than weakconvergence which only requires this convergence for bounded continuous functionh defined on X. 2

Theorem 6.6. The space (Ms(X ),‖·‖TV) is complete.

Proof. Let ξn, n ∈ N be a Cauchy sequence in Ms(X ). Define

λ =∞

∑n=0

12n |ξn| ,

which is a measure, as a limit of an increasing sequence of measures. By construc-tion, |ξn| λ for any n ∈ N. Therefore, there exist functions fn ∈ L1(λ ) suchthat ξn = fn.λ and ‖ξn−ξm‖TV =

∫| fn − fm|dλ . This implies that fn, n ∈ N

is a Cauchy sequence in L1(λ ) which is complete. Thus, there exists f ∈ L1(λ )such that fn → f in L1(λ ). Setting ξ = f .λ , we obtain that ξ ∈ Ms(X ) andlimn→∞ ‖ξn−ξ‖TV = limn→∞

∫| fn− f |dλ = 0. 2

We now define and characterize the minimum and maximum of two positivemeasures.

Proposition 6.7 Let ξ ,ξ ′ ∈M+(X ) be two measures.

(i) The set of measures η such that η ≤ ξ and η ≤ ξ ′ admits a maximal elementdenoted by ξ ∧ξ ′ and called the minimum of ξ and ξ ′.

(ii) The set of measures η such that η ≥ ξ and η ≥ ξ ′ admits a minimal elementdenoted by ξ ∨ξ ′ and called the maximum of ξ and ξ ′.

(iii) If ξ = f ·µ and ξ ′ = f ′ ·µ , then ξ ∧ξ ′ = ( f ∧ f ′) ·µ and ξ ∨ξ ′ = ( f ∨ f ′) ·µ .(iv) If ξ (X)∨ξ ′(X)< ∞, then |ξ −ξ ′|= ξ +ξ ′−2ξ ∧ξ ′ .

Proof. We assume ξ = f · µ and ξ ′ = f ′ · µ (we may set µ = ξ + ξ ′). Let ρ =( f ∧ f ′) ·µ . If η ≤ ξ and η ≤ ξ ′, we have

η(A) = η(A∩ f ≥ f ′)+η(A∩ f < f ′)≤ ξ

′(A∩ f ≥ f ′)+ξ (A∩ f < f ′)= ρ(A∩ f ≥ f ′)+ρ(A∩ f < f ′) = ρ(A) .

The proofs of (ii) and (iii) are along the same lines. Since |p−q| = p+q−2p∧q(p,q≥ 0), we obtain, for all A ∈X ,

Page 103: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

150 6 Uniformly ergodic Markov chains∫A| f − f ′|dµ =

∫A

f dµ +∫

Af ′dµ−2

∫A

f ∧ f ′dµ .

This yields (iv). 2

Remark 6.8 It must be noted that ξ ∧ ξ ′ is not defined by (ξ ∧ ξ ′)(A) = ξ (A)∧ξ ′(A), since this would not even define an additive set function. The same remarkholds for ξ ∨ξ ′.

Proposition 6.9 For ξ ,ξ ′ ∈M1(X ),

dTV(ξ ,ξ′) = 1− (ξ ∧ξ

′)(X) . (6.5)

There exists mutually singular measures η ,η ′ ∈M+(X ) such that

ξ = η +ξ ∧ξ′ , ξ

′ = η′+ξ ∧ξ

′ . (6.6)

Proof. Let µ ∈M+(X ) be such that ξ = f ·µ and ξ ′ = f ′ ·µ (e.g. set µ = ξ +ξ ′).Then, µ( f ) = µ( f ′) = 1, dTV(ξ ,ξ

′) = 12 µ(| f − f ′|) and ξ ∧ξ ′ = ( f ∧ f ′) ·µ . Thus,

applying Proposition 6.7 (iii) yields

dTV(ξ ,ξ′) =

12

µ(| f − f ′|) = 12

µ( f )+12

µ( f ′)−µ( f ∨ f ′)

= 1− (ξ ∧ξ′)(X) .

Finally (6.6) obtains by noting that

f = ( f − f ′)1 f≥ f ′+ f ∧ f ′ , f ′ = ( f ′− f )1 f ′> f+ f ∧ f ′ .

and defining η = ( f − f ′)1 f≥ f ′ and η ′ = ( f ′− f )1 f ′> f. 2

If K and K′ are two bounded kernels on (X,X ), we define

K∧K′(x,x′;A) = [K(x, ·)∧K(x′, ·)](A) , A ∈X . (6.7)

Proposition 6.10 Assume that X is countably generated. Then, K∧K′ is a kernelon (X×X)×X and the function (x,x′) 7→ ‖K(x, ·)−K(x′, ·)‖TV is measurable.

Proof. By Definition 1.5, we only have to prove that, for every A ∈X , the function(x,x′) 7→ [K(x, ·)∧K(x′, ·)](A) is measurable. By assumption, there exists a count-able collection A generating X . This implies that for all x,x′ ∈ X and A ∈X ,

[K(x, ·)−K(x′, ·)]+(A) = K(x, ·)−K(x′, ·)(A∩Sx,x′)

= supB∈A

[K(x, ·)−K(x′, ·)]+(A∩B) ,

where Sx,x′ is a Jordan set for K(x, ·)−K(x′, ·). The supremum is taken over a count-able set, so the function (x,x′) 7→ [K(x, ·)−K(x′, ·)]+(A) is measurable. Similarly,(x,x′) 7→ [K(x, ·)−K(x′, ·)]−(A) and (x,x′) 7→ |K(x, ·)−K(x′, ·)|(A) are measurable.

Page 104: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.1 Total Variation Distance 151

Since ‖K(x, ·)−K(x′, ·)‖TV = [K(x, ·)−K(x′, ·)]+(X)+[K(x, ·)−K(x′, ·)]−(X), thisproves our claim. 2

The total variation distance can be interpreted in terms of coupling of probabilitymeasures, which we define now.

Definition 6.11 (Coupling of probability measures)(i) A coupling of two probability measures ξ ,ξ ′ ∈M1(X ) is a probability measure

ξ on the product space (X×X,X ⊗X ) whose marginals are ξ and ξ ′, i.e.ξ (A×X) = ξ (A) and ξ (X×A) = ξ ′(A) for any A ∈X .

(ii) Let (Ω ,F ,P) be a probability space and X : Ω 7→X, X ′ : Ω 7→X be two randomvariables (measurable mappings w.r.t. to F and X ). The pair of random ele-ments (X ,X ′) : Ω 7→X2 us called a coupling of the probability measures (ξ ,ξ ′)if (X ,X ′) is F/X ⊗2 measurable LP (X) = ξ and LP (X ′) = ξ ′; equivalently,the pushforward of P by the pair of random elements (X ,X ′) is a coupling ofthe probability measures ξ and ξ ′.

Proposition 6.12 Let ξ ,ξ ′ ∈M1(X ) be two probability distributions on (X,X ).Then

dTV(ξ ,ξ′) = infP(X 6= X ′) . (6.8)

where the infimum is taken on the set of all coupling (X ,X ′) of the probability mea-sures (ξ ,ξ ′). Furthermore, this infimum is realized.

Proof. For any f ∈ Fb(X,X ), such that osc ( f )≤ 1, we get

|ξ ( f )−ξ′( f )|= |E [ f (X)]−E

[f (X ′)

]|= |E

[ f (X)− f (X ′)1X 6=X ′

]|

≤ osc ( f )P(X 6= X ′) .

Applying (6.2), this yields dTV(ξ ,ξ′)≤P(X 6=X ′) for all coupling (X ,X ′) of (ξ ,ξ ′).

To prove the converse inequality, assume that ξ ∧ ξ ′(X) = ε > 0. By Proposi-tion 6.9, there exist two mutually singular measures ρ,ρ ′ ∈M1(X ) such that

ξ = (1− ε)ρ + εζ , ξ′ = (1− ε)ρ ′+ εζ

with ζ = ε−1ξ ∧ξ ′. We can assume that (Ω ,F ,P) is rich enough to hold four inde-pendent random variables B,U,U ′,V , where B is a Bernoulli random variable withprobability of success ε , and U , U ′ and V are X-random variables with distributionρ , ρ ′ and ζ , respectively. Set

X = (1−B)U +BV ,X ′ = (1−B)U ′+BV .

Then, (X ,X ′) is a coupling of ξ and ξ ′, and P(X = X ′) = ε . By Proposition 6.9,dTV(ξ ,ξ

′)≥ 1− ε , thus P(X 6= X ′)≤ dTV(ξ ,ξ′). 2

Page 105: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

152 6 Uniformly ergodic Markov chains

6.2 Fixed-point Theorem

We have just seen that the set of probability measures M1(X ) endowed with thetotal variation distance is a complete metric space. A Markov kernel is an operatoron this space and an invariant probability measure is a fixed point of this operator.It is thus natural to use the Fixed-Point Theorem in order to find condition for theexistence of an invariant measure and to find the convergence rate of the sequenceof iterates of the kernel to the invariant probability measure. Although it is wellknown, we state and prove it here in order to exhibit a rate of convergence andprecise constants.

Theorem 6.13 (Fixed-Point Theorem). Let (F,d) be a complete metric space. LetT :F→F be a continuous operator such that, for some positive integer m, α ∈ (0,1)and all u,v ∈ F,

d(T mu,T mv)≤ αd(u,v) . (6.9)

Then there exists a unique fixed point a ∈ F and for all u ∈ F,

d(T nu,a)≤ m(1−α)−1 max0≤r<m

d(T ru,T r+1u) αbn/mc , (6.10)

Moreover, if for r ∈ 1, . . . ,m there exists Ar such that d(T ru,T rv)≤ Ard(u,v) forall u,v ∈ F, then

d(T nu,a)≤ max1≤r<m

Ar d(u,a) αbn/mc . (6.11)

Proof. Let us first prove the uniqueness. If Ta = a and T b = b, we have T ma = aand T mb = b, thus

d(a,b) = d(T ma,T mb)≤ αd(a,b)< d(a,b)

since α ∈ (0,1). This yields a = b. To prove the existence, consider u,v ∈ F and aninteger n. Write n = km+ r with r ∈ 0, . . . ,m−1. Then

d(T nu,T nv) = d(T km+ru,T km+rv)≤ αkd(T ru,T rv) .

Taking v = Tu we obtain

d(T nu,T n+1u) = αkd(T ru,T r+1u) = α

bn/mcd(T ru,T r+1u)

≤ αbn/mc max

0≤r<md(T ru,T r+1u) .

This implies that T nu is a Cauchy sequence and denoting its limit by a, we obtain

Page 106: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.2 Fixed-point Theorem 153

d(T nu,a)≤ max0≤r<m

d(T ru,T r+1u)∞

∑q=bn/mc

αbq/mc

= m(1−α)−1 max0≤r<m

d(T ru,T r+1u) αbn/mc .

Since T is continuous, Ta = T limn→∞ T nu = limn→∞ T n+1u = a, hence a is a fixedpoint. Assume now that d(T ru,T rv) ≤ Ard(u,v) for all u,v and r ∈ 1, . . . ,m− 1.For all u ∈ F

d(T nu,a) = d(T nu,T na)≤ αbn/mcd(T n−mbn/mcu,T n−mbn/mca)

≤ αbn/mcAn−mbn/mcd(u,a)≤ α

bn/mc max1≤r≤m−1

Ar d(u,a) .

2

We now apply the fixed point theorem to a Markov kernel P, considered as anoperator on a subset F of M1(X ).

Theorem 6.14. Let F be a subspace of M1(X ) and d be a metric on F such thatδx ∈ F for all x ∈ X and (F,d) is complete. Let P be a Markov kernel such thatξ P ∈ F if ξ ∈ F. Assume that there exist a positive integer m, constants Ar > 0,r ∈ 1, . . . ,m−1 and α ∈ (0,1) such that, for all ξ ,ξ ′ ∈ F,

d(ξ Pr,ξ ′Pr)≤ Ard(ξ ,ξ ′) , r ∈ 1, . . . ,m−1 (6.12)d(ξ Pm,ξ ′Pm)≤ αd(ξ ,ξ ′) . (6.13)

Then there exists a unique invariant probability π ∈ F and for all ξ ∈ F,

d(ξ Pn,π)≤ max1≤r<m

Ar d(ξ ,π)αbn/mc . (6.14)

Assume moreover that the convergence of a sequence of probabilities in (F,d) im-plies its weak convergence. Then π is the unique P-invariant probability in M1(X ).

Proof. We only need to prove the last part of the theorem, since the first part isa rephrasing of Theorem 6.13. Let π ∈ F be the unique invariant probability in Fand let π be an invariant probability in M1(X ). Then, for all continuous boundedfunction f , we have

π( f ) = πPn( f ) =∫

Pn f (x)π(dx) .

By the first part of the Theorem, the sequence δxPnn , n∈N converges with respect

to the distance d, hence weakly to the probability π . Thus limn→∞ Pn f (x) = π( f ) forall x ∈ X and all continuous bounded function f . Since, in addition, |Pn f (x)| ≤ | f |∞the dominated convergence theorem implies that limn→∞

∫Pn f (x)π(dx) = π( f ),

which yields π( f ) = π( f ) for all bounded continuous function f . Therefore, π = π ,which concludes the proof. 2

Page 107: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

154 6 Uniformly ergodic Markov chains

The second part of the theorem is very important. It states that if convergencewith respect to d implies weak convergence (i.e. the topology induced by d is finerthan the topology of weak convergence), then the invariant probability is not onlyunique in F, but also in M1(X ). If F=M1(X ), then this condition is superfluousto obtain the uniqueness of the invariant probability in M1(X ).

6.3 Dobrushin coefficient and uniform ergodicity

Definition 6.15 (Dobrushin Coefficient) Let P be a Markov kernel on a measur-able space (X,X ). The Dobrushin coefficient ∆(P) of P is the Lipschitz coefficientof P with respect to the total variation distance, i.e.

∆(P) = supξ 6=ξ ′∈M1(X )

dTV(ξ P,ξ ′P)dTV(ξ ,ξ

′)= sup

ξ 6=ξ ′∈M1(X )

‖ξ P−ξ ′P‖TV‖ξ −ξ ′‖TV

. (6.15)

The Dobrushin coefficient is always less than 1, which implies that P defines aweak contraction on (M1(X ),‖·‖TV).

Lemma 6.16 Let P be a Markov kernel on (X,X ). Then, 0≤ ∆(P)≤ 1.

Proof. Since |P f |∞≤ 1 whenever | f |∞≤ 1, the characterization of the total variationnorm (6.1) yields∥∥ξ −ξ

′∥∥TV = sup

| f |∞≤1|ξ P( f )−ξ

′P( f )|= sup| f |∞≤1

|ξ (P f )−ξ′(P f )|

≤ sup| f |∞≤1

|ξ ( f )−ξ′( f )|=

∥∥ξ −ξ′∥∥

TV .

2

Corollary 6.17 If π is an invariant probability, then for any ξ ∈M1(X ) the se-quence

∥∥ξ Pn+1−π∥∥

TV ,n ∈ N is non increasing.

Proof. ∥∥ξ Pn+1−π∥∥

TV = ‖ξ PnP−πP‖TV ≤ ‖ξ Pn−π‖TV .

2

Lemma 6.18 Let P be a Markov kernel on (X,X ). Then

∆(P) = sup(x,x′)∈X×X

dTV(δxP,δx′P) =12

sup(x,x′)∈X×X

∥∥P(x, ·)−P(x′, ·)∥∥

TV . (6.16)

Proof. Since dTV(ξ ,ξ′) ≤ 1 for any pair of probabilities ξ ,ξ ′, (6.16) implies that

∆(P) ≤ 1. We now prove (6.16). By definition, the right-hand side of (6.15) is lessthan or equal to ∆(P). We now prove the converse inequality. By the definition ofthe total variation distance and homogeneity, it holds that

Page 108: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.3 Dobrushin coefficient and uniform ergodicity 155

∆(P) = sup‖ξ P‖TV : ξ ∈M0(X ), ‖ξ‖TV ≤ 1 . (6.17)

Using (6.17), Theorem 6.3 and the bound (6.3), we have, for ξ ∈M0(X ), sinceξ P ∈M0(X ),

‖ξ P‖TV = 2 supf :osc( f )≤1

|(ξ P)( f )|= 2 supf :osc( f )≤1

|ξ (P f )| ≤ ‖ξ‖TV supf :osc( f )≤1

osc (P f ) .

Note now that

suposc( f )≤1

osc (P f ) = suposc( f )≤1

supx,x′|P f (x)−P f (x′)|

= supx,x′

suposc( f )≤1

∣∣P(x, ·)−P(x′, ·) f∣∣

=12

supx,x′

∥∥P(x, ·)−P(x′, ·)∥∥

TV = supx,x′

dTV(δxP,δx′P) .

Thus, for ξ ∈M0(X ) such that ‖ξ‖TV ≤ 1, we obtain

‖ξ P‖TV ≤ supx,x′

dTV(δxP,δx′P) .

This proves the converse inequality. 2

Theorem 6.19. Let (X,X ) be a measurable space. Let P be a Markov kernel suchthat, for some integer m, ∆(Pm)≤ ρ < 1. Then, P admits a unique invariant proba-bility measure π . In addition, for all ξ ∈M1(X ),

‖ξ Pn−π‖TV ≤ ‖ξ −π‖TV ρbn/mc ,

where buc is the greatest integer smaller than or equal to u.

Proof. By Theorem 6.6, (M1(X ),dTV) is a complete metric space. On the otherhand, convergence with respect to the total variation distance implies weak conver-gence. The proof follows from Theorem 6.14 (with A = 1). 2

Again, since the total variation distance of two probabilities is always less than 1,we have

dTV(ξ Pn,π)≤ ρbn/mc (6.18)

This means that the convergence is uniform with respect to the initial distributionand holds at a geometric rate.

It is natural to ask wether it is possible for the convergence to be uniform, but at arate slower than geometric. Surprisingly, the answer is no. We first give definitions,and then prove this fact.

Definition 6.20 (Uniform ergodicity) Let P be a Markov kernel on (X,X ).

Page 109: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

156 6 Uniformly ergodic Markov chains

• P is uniformly ergodic if there exists a probability distribution π such thatlimn→∞ supx∈X ‖Pn(x, ·)−π‖TV = 0.

• P is uniformly geometrically ergodic if there exist a probability distribution π

and constants C ∈ [0,∞) and ρ ∈ [0,1) such that supx∈X ‖Pn(x, ·)−π‖TV ≤Cρn,

Uniform ergodicity is a valuable property, since it ensures that the convergence ofthe Markov chain does not depend on the initial value chosen (contrary to what itmight imply, this is not always a guarantee for fast convergence to stationarity).

Proposition 6.21 Let P be a Markov kernel on a measurable space (X,X ). Thefollowing statements are equivalent.

(i) P is uniformly ergodic.(ii) P is uniformly geometrically ergodic.

(iii) There exists a positive integer m such that ∆(Pm)< 1.

Proof. We already know that (iii)⇒ (ii) and by definition, (ii)⇒ (i). Conversely,assume that P is uniformly ergodic. By definition, limn→∞ supx∈X ‖Pn(x, ·)−π‖TV =0. Thus there exists an integer m such that supx∈X ‖Pm(x, ·)−π‖TV < 1. The triangleinequality yields

12

sup(x,x′)∈X×X

∥∥Pm(x, ·)−Pm(x′, ·)∥∥

TV ≤ supx∈X‖Pm(x, ·)−π‖TV < 1 .

By (6.16), this means that ∆(Pm)< 1. 2

We now give a sufficient condition to obtain ∆(Pm) < 1. Recall the definition(6.7) of the infimum of two kernels.

Definition 6.22 (Doeblin’s condition) Let m ∈ N∗ and ε > 0. The Markov kernelP is said to satisfy the (m,ε)-Doeblin condition if for every (x,x′) ∈ X×X,

[Pm(x, ·)∧Pm(x′, ·)](X)≥ ε . (6.19)

A Markov kernel which satisfies the Doeblin condition is uniformly ergodic.

Theorem 6.23. If the Markov kernel P satisfies the (m,ε)-Doeblin condition, then,∆(Pm) ≤ 1− ε and there exists a unique invariant probability measure π suchthat dTV(ξ Pn,π)≤ (1− ε)bn/mc.

Proof. By Equation (6.5) of Proposition 6.9, the Dobrushin coefficient can be ex-pressed as follows:

∆(Pm) = 1− sup(x,x′)∈X2

Pm(x, ·)∧Pm(x′, ·)(X) . (6.20)

Thus Doeblin’s condition (6.19) implies ∆(Pm)≤ 1− ε and the conclusion followsfrom Theorem 6.19.

2

Page 110: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.4 A coupling proof of uniform ergodicity 157

A sufficient condition for Doeblin’s condition is the following one, referred to asthe uniform Doeblin condition.

Lemma 6.24 If there exists a probability measure ν ∈M1(X ) such that, for everyx ∈ X and A ∈X ,

Pm(x,A)≥ εν(A) , (6.21)

then the Markov kernel P satisfies the (m,ε)-Doeblin condition

Proof. By Proposition 6.7. [Pm(x, ·) ∧ Pm(x′, ·)] is the largest measure which issmaller than both Pm(x, ·) and Pm(x′, ·). Thus so εν(·) ≤ [Pm(x, ·)∧Pm(x′, ·)] andthis yields (6.19). 2

It will be shown in Theorem 6.28 that (6.21) is sufficient but not necessary for (6.19).

6.4 A coupling proof of uniform ergodicity

We can extend the coupling definition of distribution to stochastic processes. Thedefinition below is simply a restatement of Definition 6.11 to X-valued stochasticprocesses which can be seen as random elements on (XN,X ⊗N).

Definition 6.25 (Coupling of stochastic processes) Let (Ω ,F ,P) and (Ω ′,F ′,P′)be two probability spaces and let X = Xn, n ∈ N and X′ = X ′n, n ∈ N be twoX-valued stochastic processes defined on these probability spaces. The stochasticprocesses X and X′ can be seen as random elements taking values in (XN, X ⊗N)and we denote Q the pushforward of P by X (the distribution of X under P) and Q′the pushforward of P′ by X′ (the distribution of X′ under P′)

Let (Ω ,F , P) be a probability space and (X, X′) = (Xn, X ′n), n ∈ N be astochastic process on (Ω ,F , P) taking values in XN×XN. The pair (X, X′) is acoupling of X and X′ (or of the distribution Q and Q′) if the pushforward of P by Xis equal to Q and the pushforward of P by X′ is equal to Q′.

Proving the existence of such coupling is an essential task. In the following sub-sections, this will be done by extending the canonical space.

We are interested in a particular type of coupling for which there exists a N-valued random variable T such that

Xn = X ′n for all n≥ T . (6.22)

Such a random variable T is called a coupling time. For instance, T can be definedas the smallest integer such that (6.22) holds:

T = infn ∈ N : Xm = X ′m, for all m≥ n . (6.23)

Theorem 6.26 (Lindvall). Let X = Xn, n ∈ N and X′ = X ′n, n ∈ N be randomelements defined on the probability spaces (Ω ,F ,P) and (Ω ′,F ′,P′) taking values

Page 111: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

158 6 Uniformly ergodic Markov chains

in (XN, X ⊗N). Let (X, X′) be a coupling of X and X′ defined on a probability space(Ω ,F , P) on which, and let T be a coupling time. Then

dTV(P(Xm,m≥ n ∈ ·),P′(X ′m,m≥ n ∈ ·))≤ P(T > n) .

Proof. Let Qn and Q′n denote the distributions of the sequences Xm,m ≥ n andX ′m,m≥ n, respectively. By Proposition 6.12, we have

dTV(Qn,Q′n)≤ P(∃m≥ n : Xm 6= X ′m) = P(T > n) .

2

Coupling for Markov chains

Let P be a Markov kernel on (X,X ) and (ξ ,ξ ′) be two probability distributions on(X,X ). Assume that there exists a probability space (Ω ,F , Pξ⊗ξ ′) on which wecan define a process (Xn, X ′n), n ∈ N such that

(I) (Xn, X ′n), n ∈ N is a Markov chain on the product space X×X with initialdistribution ξ ⊗ξ ′,

(II) Xn, n ∈ N is a Markov chain on X with kernel P and initial distribution ξ ,(III) X ′n, n ∈ N is a Markov chain on X with kernel P and initial distribution ξ ′.

Then, by Theorem 6.26, we will obtain

dTV(ξ Pn,ξ ′Pn)≤ Pξ ,ξ ′(T > n) , (6.24)

for any coupling time T . We emphasize that it is not required that the two processesXn, n ∈ N and X ′n, n ∈ N evolve independently; on the contrary, we want todefine them in such a way that the probability of coupling, i.e. that the processeseventually become and stay equal to each other is as large as possible. We nowdescribe one particular coupling construction, in the case where P satisfies the (1,ε)-Doeblin condition.

Coupling under the (1,ε)-Doeblin condition

Define the kernels R and Q on X×X×X as follows: for (x,x′)∈X×X and A∈X ,

R(x,x′;A) =[P(x, ·)∧P(x′, ·)](A)[P(x, ·)∧P(x′, ·)](X)

, (6.25a)

Q(x,x′;A) = (1− ε)−1P(x,A)− εR(x,x′;A). (6.25b)

By construction, for all x,x′ ∈ X×X and A ∈X ,

Page 112: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.4 A coupling proof of uniform ergodicity 159

P(x,A) = (1− ε)Q(x,x′;A)+ εR(x,x′ : A) ,

P(x′,A) = (1− ε)Q(x′,x;A)+ εR(x,x′;A) .

Let us first describe informally the coupling construction. Let X0 and X ′0 be randomvariables with marginal distribution ξ and ξ ′. Not that there is no particular require-ment for the joint distribution. Then, assuming Xk and X ′k are given for some k ≥ 0,we construct X(k+1) and X ′(k+1) inductively as follows.

• If Xk = X ′k, let Y be a random variable with distribution P(Xk, ·), independentof X0, X ′0, . . . , Xk−1, X ′k−1 and define X(k+1) = X ′(k+1) = Y .

• If Xk 6= X ′k, then draw a Bernoulli random variable Bk+1 with probability ofsuccess ε , independently of X0, X ′0, . . . , Xk, X ′k.

– If Bk+1 = 1, let Y be a random variable with distribution R(Xk, X ′k, ·), inde-pendent of X0, X ′0, . . . , Xk−1, X ′k−1 and define X(k+1) = X ′(k+1) = Y .

– If Bk+1 = 0, then draw Xk+1 and X ′k+1 with respective distributions Q(Xk, X ′k, ·)and Q(X ′k, Xk, ·) and independent of X0, X ′0, . . . , Xk−1, X ′k−1. Here again, thejoint distribution of (Xn, X ′n) is irrelevant.

More formally, we define a Markov chain (Xn, X ′n,Bn),n ∈ N on the canonicalspace XN×XN×0,1N with initial distribution ξ ⊗ ξ ′⊗ νε (where νε(1) = 1−νε(0) = ε) and Markov kernel P defined as follows: for x 6= x′ ∈ X, A,B ∈ X ,C ⊂ 0,1,

P(x,x,1,A×B×C) = P(x,x,0,A×B×C) = P(x,A)νε(C) ,

P(x,x′,0,A×B×C) =

εR(x,x′,A∩B)+(1− ε)H(x,x′,A×B)

νε(C) ,

where H is a Markov kernel on (X×X,X ⊗X ) such that

H(x,x′,A×X) = Q(x,x′,A) , H(x,x′,X×B) = Q(x′,x,B) .

For instance, Xk+1 and Xk+1 can be drawn independently, that is H(x,x′,A×B) =Q(x,x′,A)Q(x′,x,B), but this may not be the optimal choice.

This construction ensures that Xn, n ∈ N and X ′n, n ∈ N are Markov chainson X with kernel P and initial distributions ξ and ξ ′, respectively. Therefore,(Xn, X ′n), n ∈N is a coupling of the distributions Pξ and Pξ ′ of the Markov chainswith kernel P and respective initial distributions ξ and ξ ′.

Proposition 6.27 For all n ∈ N, and A ∈X ,

P(

Xn+1 ∈ A∣∣F X

n

)= P(Xn,A) , P

(X ′n+1 ∈ A

∣∣F X ′n

)= P(X ′n,A) . (6.26)

Proof. By construction, we have

Page 113: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

160 6 Uniformly ergodic Markov chains

P(

Xk+1 ∈ A∣∣F (X ,X ′))

k

)= P

(Xk+1 ∈ A

∣∣F (X ,X ′)k

)1Xk=X ′k

+P(

Xk+1 ∈ A∣∣F (X ,X ′)

k

)1Xk 6=X ′k

= P(Xk,A)1Xk=X ′k+εR(Xk, X ′k;A)+(1− ε)Q(Xk, X ′k;A)1Xk 6=X ′k

= P(Xk,A) .

Thus, Xn is a Markov chain with Markov kernel P and initial distribution ξ withrespect to the filtration F X ,X ′. This yields (6.26) since a Markov chain with re-spect to one filtration is also a Markov chain with respect to its natural filtration.Cf. Remark 1.21. 2

The coupling time T is defined as in (6.23) as the first positive instant n such thatXn = X ′n, with the convention that T = ∞ if this never arises. Since coupling occurswhen Bk = 1, the coupling time can also be defined as T = infk : Bk = 1. SinceBn, n ∈ N is a sequence of i.i.d. Bernoulli random variables with probability ofsuccess ε , the coupling time T has a geometric distribution with mean 1/ε , i.e.

P(T > n) = (1− ε)n .

We conclude that for any initial distributions ξ , ξ ′,

dTV(ξ Pn,ξ ′Pn)≤ (1− ε)n .

In particular, if π is the unique invariant probability measure, this yields thatdTV(ξ Pn,π)≤ (1− ε)n, in accordance with (6.18).

Coupling under the (m,ε)-Doeblin condition

If the kernel P satisfies the (m,ε)-Doeblin condition with m > 1, then it is muchmore difficult, but still possible, to build a coupling that satisfies (6.24). However,in order to obtain the rate of convergence (6.18), the previous coupling constructioncan be applied to the kernel Pm to obtain an alternate proof that ∆(Pm) ≤ 1− ε .Since ∆(Pr)≤ 1 for all r ≥ 1, we can apply the Fixed Point Theorem 6.14to obtainthat dTV(ξ Pn,π)≤ (1− ε)bn/mc.

6.5 Examples

6.5.1 Finite and countable state-space

If the state space X is finite and the chain is irreducible and aperiodic, then we can fixx0 ∈ X and it follows from Proposition 3.22 that Pn(x0, x0)> 0 for all n > N, whereN is large enough. Since the chain is irreducible, for each x ∈ X there exists an m(x)

Page 114: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.5 Examples 161

such that Pm(x)(x, x0) > 0; hence for m ≥ maxx∈X

(m(x)+N) we have Pm(x, x0) > 0,

and so Pm(x,A)≥ εδx0(A) for every x∈X, where ε = infx′∈X Pm(x′, x0)> 0. Hence,the uniform Doeblin condition (6.21) holds with ν = δx0 . Thus, every finite irre-ducible aperiodic chain on a finite state space is uniformly ergodic. The constant ε

might in fact be rather small if the space is large. As shown in the following ex-amples, sharper bounds might be obtained by verifying directly the non uniformcondition (6.19) rather than the uniform one (6.21).

For a countable state-space, if there exists an integer m such that

εm := inf(x,x′)∈X×X

∑z∈X

Pm(x,z)∧Pm(x′,z)> 0 ,

then the Markov kernel P satisfies the (m,εm)-Doeblin condition (6.19). If thereexists an integer m such that

εm = ∑y∈X

infx∈X

Pm(x,y)> 0 ,

then the Markov chain Pm satisfies the (m,εm)-uniform Doeblin condition (6.21)with

νm(z) =infx∈X Pm(x,z)

∑y∈X infx∈X Pm(x,y).

If the state space is finite, let #(X) denote its cardinal. Assume that there exist an in-teger m such that, for all x ∈ X and #y ∈ X,Pm(x,y) > 0 > #(X)/2 then theMarkov kernel P is uniformly ergodic. Indeed, this condition obvioulsy impliesthat ∑z∈X Pm(x,z)∧Pm(x′,z= εm > 0 and thus the Markov kernel satisfies the(m,εm) Doeblin condition (6.19).

Example 6.28. Consider the Markov chain on X = 1,2,3 with stationary distri-bution the uniform distribution π = (1/3,1/3,1/3) on X, and with transition prob-abilities given by

P =

1/2 1/2 00 1/2 1/21/2 0 1/2

Then P(x, ·)∧P(x′, ·) ≥ (1/2)δ z(x,x′) where z(1,2) = 2, z(1,3) = 1, z(2,3) = 3.Hence, Theorem 6.23 shows that ∆(P) = 1/2 and Theorem 6.19 implies

dTV(Pn(x, ·),π)≤ (1/2)n , n ∈ N .

On the other hand, the kernel P does not satisfy the uniform Doeblin condition(6.21) with m = 1 for any ε > 0. For m = 2,

P2 =

1/4 1/2 1/41/4 1/4 1/21/2 1/4 1/4

Page 115: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

162 6 Uniformly ergodic Markov chains

Hence, P2(x,A) ≥ 3/4π(A), for every x ∈ X and A ⊂ X. Therefore, the uniformDoeblin condition (6.21) is satisfied with m = 2 and ε = 3/4. Hence ∆(P2) = 1/4and Theorem 6.19 yields that

dTV(Pn(x, ·),π)≤ (1/4)bn/2c =

(1/2)n if n is even,(1/2)n−1 if n is odd.

This second bound is essentially the same as the first one and both are (nearly)optimal since the modulus of the second largest eigenvalue of the matrix P is equalto 1/2.

Example 6.29 (Ehrenfest’s urn). Ehrenfest’s urn was introduced in Section 1.9.4.It is a periodic with period 2. In order to simplify the discussion, we will make itaperiodic by assuming that instead of always jumping from one state to an adjacentone, it may remain at the same state with probability 1/2. It is then a Markov chainwith state space 0, . . . ,N and transition matrix P defined by

P(i, i+1) =N− i2N

, i = 0, . . . ,N−1 ,

P(i, i) =12, i = 0, . . . ,N ,

P(i, i−1) =i

2N, i = 1, . . . ,N .)

Then,

P(i, ·)≥ 12N

ν

where ν is the uniform measure on 0, . . . ,N. This shows that the (1,1/2N)-uniform Doeblin condition (6.21) holds.

Example 6.30 (Random-scan Gibbs sampler). Consider the Random Scan Gibbssampler for an everywhere-positive probability distribution π on the state space X=0,1d , i.e. the vertices of a d-dimensional hypercube (so that |X| = 2d). GivenXk = (x1, , xd), the next candidate Xk+1 = (z1, , zd) is obtained by (a) choosingIk+1 uniform on the index set 1,2, ,d; (b) setting zi = xi for i 6= Ik+1; and (c)for i = Ik+1, choosing zi to be 0 or 1 conditionally independently, according to theprobabilities πi,x(`), ` ∈ 0,1

πi,x(`) =π((x1, ,xi−1, `,xi+1, ,xd))

π((x1, ,xi−1,0,xi+1, ,xd))+π((x1, ,xi−1,1,xi+1, ,xd)).

Denote by M = minx∈X

π(x)/maxx∈X

π(x) , so that 0 < M ≤ 1, and so that M/(1+

M)≤ πi,x(`)≤ 1/(1+M).We shall assume for simplicity that d is even. We shall prove that the Markov

chain satisfies the (d/2,ε)-Doeblin condition. We use the following coupling con-struction. We run two chains from x and x′ simultaneously, coupled so that when one

Page 116: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.5 Examples 163

chain is updating site i ∈ 1, . . . ,d, the other chain is updating site d +1− i. Con-sider making d/2 updates. In order for all sites to have been updated by one chainor other after d/2 updates, it is necessary for the chain started from x (say) to notrepeat visiting any one site i or its “complement” d+1− i. This happens with prob-ability (d/2)!2d/2/dd/2 In each of the first d updates ( d/2 from each chain), thereis a probability at least M/(1+M) of matching the other chain in that coordinate. Itfollows that the state space X is (d/2,ε)-Doeblin set where

ε ≥ Md

(1+M)d(d/2)!2d/2

dd/2 .

The uniform Doeblin condition cannot be satisfied if m < d. For m = d, the(d,π, ε)-Doeblin condition is satisfied where ε is any constant satisfying Mdd!d−d ≤ε ≤M−dd!d−d . Note indeed that there is probability d!d−d that I1, , Id are all dis-tinct, and given that they are, the probability of hitting a given site x ∈ X after dsteps belongs to the interval Mdπ(x) and M−dπ(x). If π = π1××πd is a prod-uct measure so that the coordinates move independently and we have ε = d!d−d

exactly.In the case where π is uniform (M = 1), Stirling’s formula gives (for large d)

that ε ≈ (πd)1/2e−d/2 for the (d/2,ε)-Doeblin condition and ε ≈ (2πd)1/2e−d forthe (d, ε,π)-Doeblin condition. This indicates that ε > ε for sufficiently large d. Weconclude that, for this example, the Doeblin condition gives better bounds than theuniform Doeblin condition.

6.5.2 General state-space

In many situations the uniform ergodicity of a chain holds when the space is com-pact, provided that the Markov kernel of the chain is suitably regular.

Consider for example a Markov kernel such that, for some integer m, Pm has acomponent having a density pm(x, y) with respect to a reference measure λ , i.e. forall x ∈ X and A ∈X ,

Pm(x,A)≥∫

Apm(x,y)λ (dy) .

Assume that

εm = inf(x,x′)∈X×X

∫X

pm(x,y)∧ pm(x′,y)λ (dy)> 0 .

Then [Pm(x, ·)∧Pm(x′, ·)](X) ≥ εm and the Markov kernel Pm satisfies the (m,εm)-Doeblin condition.

Assume moreover that there exists a nonnegative measurable function gm is anynonnegative measurable function such that gm(y)≤ infx∈X pm(x, y) for λ -a.e. y ∈ X

Page 117: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

164 6 Uniformly ergodic Markov chains

and define εm =∫X gm(y)λ (dy). If ε > 0, define

νm(A) = ε−1m

∫A

gm(y)λ (dy) .

Then for every x ∈ X and A ∈X ,

Pm(x, A)≥∫

Apm(x, y)λ (dy)≥ εmνm(A) .

That is, the (m, εm)-uniform Doeblin condition holds.In the case of the Metropolis-Hastings algorithm of ??, this condition holds for

instance if the state space is compact and the candidate density q(x,y) is continuousin x for all y and positive for all x,y (which can be done if we are choosing q) andif π is continuous and positive everywhere. This follows easily from the form of theaccept-reject rules.

Example 6.31 (Functional autoregressive model of order 1). Consider the func-tional autoregressive model Xk = f (Xk−1)+σ(Xk−1)Zk, where Zk, k ∈N are i.i.d.standard Gaussian random variables, f and σ are bounded measurable functions andthere exist a,b > 0 such that a≤ σ2(x)≤ b for all x ∈R. The Markov kernel of thischain has a density w.r.t. the Lebesgue measure given by

p(x,y) =1√

2πσ2(x)exp(− 1

2σ2(x)(y− f (x))2

).

Then, for y ∈ [−1,1], we have

infx∈R

p(x,y)≥ 1√2πb

exp(− 1

2a(y− inf

x∈Rf (x))2∨ (y− sup

x∈Rf (x))2

)> 0 .

Thus the Markov kernel P satisfies the uniform (1,ε)-Doeblin condition and ∆(P)≤1− ε , with ε =

∫ 1−1 infx∈R p(x,y)dy.

Example 6.32 (Independent Metropolis-Hastings sampler). We consider againthe Independent Metropolis-Hastings algorithm (see ??, Theorem 1.58). Let µ be aσ -finite measure on (X,X ). Let π denote the density with respect to µ of the targetdistribution and q the proposal density. Assume that supx∈X π(x)/q(x) < ∞. GivenXt−1 a proposal Yt is drawn from the distribution q, independently from the past.Then, set Xt = Yt with probability α(Xt ,Yt) where

α(x,y) :=π(y)q(x)π(x)q(y)

∧1 .

Otherwise, set Xt+1 = Xt . The transition kernel P of the Markov chain is defined, for(x,A) ∈ X×X , by

P(x,A) =∫

Aq(y)α(x,y)µ(dy)+

[1−

∫q(y)α(x,y)µ(dy)

]δx(A) .

Page 118: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

6.5 Examples 165

As shown in ??, the transition kernel P is reversible with respect to π . Hence, π is astationary distribution for P. Assume now that

there exists η > 0 such that for all x ∈ X, q(x)≥ ηπ(x). (6.27)

Then, the kernel P is uniformly ergodic. Indeed, for all x ∈ X and A ∈X , we have

P(x,A)≥∫

A

(π(y)q(x)π(x)q(y)

∧1)

q(y)µ(dy)

=∫

A

(q(x)π(x)

∧ q(y)π(y)

)π(y)µ(dy)≥ ηπ(A) .

Consider the situation in which X=R and π =N(0,1). If the true mean is unknown,then a possible choice for the proposal density might be the Normal density centeredat some known fixed value, e.g. q might be taken to be the density of the N(1,1)distribution. The acceptance ratio would then be

α(x,y) = 1∧ π(y)π(x)

q(x)q(y)

= 1∧ ex−y .

This choice implies that moves to the right may be rejected, but moves to the leftare always accepted, due to the relative positions of the means of the proposal andtarget distributions. Condition (6.27) is not satisfied in this case and it can be shownthat the algorithm cannot converge at a geometrical rate to the target distribution.

If on the other hand the mean is known but the variance (which is equal to 1) isunknown, then we may take the proposal density q to be N(0,σ2) for some knownσ2 > 1. Then q(x)/π(x) ≥ σ−1 and (6.27) holds. This shows that the transitionkernel P satisfies Doeblin’s condition with ε = σ−1. The Markov chain is uniformlygeometrically ergodic at a rate at least equal to (1−σ−1)n. This quantifies preciselythe importance of choosing σ close to the true value of the unknown variance.

Example 6.33 (Slice Sampler). Consider the Slice Sampler as described in Theo-rem 1.62 in the particular situation where k = 1 and f0 = 1, f1 = π . In such a case,given Xn, the random variable Yn+1 is drawn according to U(0,π(Xn)) and then, Xn+1is drawn according to a density proportional to x 7→ 1π(x) ≥ Yn+1. The Markovkernel P of Xn, n ∈ N may thus be written as: for all (x,B) ∈ X×X ,

P(x,B) =1

π(x)

∫π(x)

0

Leb(B∩L(y))Leb(L(y))

dy , (6.28)

where L(y) := x′ ∈ X : π(x′)≥ y. Assume that π is bounded and that the topolog-ical support Sπ of π is such that Leb(Sπ) < ∞. Under these assumptions, we willshow that P is uniformly ergodic. Denote M := supx∈X π(x). By combining (6.28)with Fubini’s theorem, we obtain for all x ∈ X,

Page 119: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

166 6 Uniformly ergodic Markov chains

P(x,B) =∫

B

(1

π(x)

∫π(x)

0

1π(x′)≥ yLeb(L(y))

dy)

dx′

≥ 1Leb(Sπ)

∫B

π(x)∧π(x′)π(x)

dx′

≥ 1Leb(Sπ)

∫B

1∧ π(x′)M

dx′ .

The proof follows.

Page 120: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Chapter 7V-geometrically ergodic Markov chains

The uniform ergodicity (or equivalently the Doeblin condition ) is restrictive anddoes not hold for many models used in practice. Nevertheless, in many situations ofinterest, the iterates of the Markov kernel converge to the invariant distribution at ageometric rate, though not uniformly. This means that there exists ρ ∈ (0,1) and afunction M : X→R+, such that ‖Pn(x, ·)−π‖TV ≤M(x)ρn for all n ∈N and x ∈ X.

The method described in this Chapter for proving geometric ergodicity requiresthat one establishes a drift condition and an associated minorization condition forthe underlying Markov kernel P. Once these conditions have been established, quan-titative estimates of the rate at which the iterates of the Markov kernel approach theinvariant distribution can be obtained. These quantitative estimates can be employedto calculate a bound on the number of iterations needed to achieve a prescribed dis-tance between the iterate of the kernel and the invariant distribution.

7.1 V-total variation

Definition 7.1 (V -norm) Let V ∈ F(X,X ) with values in [1,∞). The space of finitesigned measures ξ such that |ξ |(V )< ∞ is denoted by MV (X ).

(i) The V -norm of a function f ∈ F(X,X ), denoted | f |V, is defined by

| f |V = supx∈X

| f (x)|V (x)

.

(ii) The V -norm of a measure ξ ∈MV (X ), denoted ‖ξ‖V, is defined by

‖ξ‖V = |ξ |(V ) .

Of course, when V = 1X, then ‖ξ‖1X

= ‖ξ‖TV by (6.1). It also holds that ‖ξ‖V =

‖V ·ξ‖TV. We now give characterizations of the V -norm similar to the characteriza-tions of the TV-norm provided in Theorem 6.3. Define

167

Page 121: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

168 7 V-geometrically ergodic Markov chains

oscV ( f ) := sup(x,x′)∈X×X

| f (x)− f (x′)|V (x)+V (x′)

. (7.1)

Theorem 7.2. For ξ ∈MV (X ),

‖ξ‖V = supξ ( f ) : f ∈ F(X,X ), | f |V ≤ 1 . (7.2)

Let ξ ∈M0(X )∩MV (X ). Then,

‖ξ‖V = supξ ( f ) : oscV ( f )≤ 1 . (7.3)

Proof. Equation (7.2) follows from

‖ξ‖V = ‖V ·ξ‖TV = sup| f |∞≤1

ξ (V f ) = sup| f |V≤1

ξ ( f ) .

Let S be a Jordan set for ξ . Since ξ (V1S−V1Sc)= |ξ |(V ) and oscV (V1S−V1Sc)=1, we obtain that

‖ξ‖V = |ξ |(V )≤ sup|ξ ( f )| : oscV ( f )≤ 1 .

Since oscV ( f )≤ infc∈R | f − c|V and ξ ( f ) = ξ ( f − c) for ξ ∈M0(X ), we obtain

|ξ ( f )|= |ξ ( f − c)| ≤ ‖ξ‖V | f − c|V .

This yields sup|ξ ( f )| : oscV ( f )≤ 1 ≤ ‖ξ‖V. 2

Note that when V = 1X, then oscV ( f ) = osc ( f )/2 and thus Theorem 6.3 can beseen as a particular case of Theorem 7.2. We also have the following bound whichis similar to (6.3)

|ξ ( f )| ≤ ‖ξ‖V oscV ( f ) . (7.4)

We now state a very important completeness result.

Proposition 7.3 The space (MV (X ),‖·‖V) is complete.

Proof. Let ξn, n ∈ N be a Cauchy sequence in MV (X ). Define

λ =∞

∑n=0

12n|ξn|(V )

|ξn| ,

which is a measure, as a limit of an increasing sequence of measures. By con-struction, λ (V ) < ∞ and |ξn| λ for any n ∈ N. Therefore, there exist functionsfn ∈ L1(V ·λ ) such that ξn = fn.λ and ‖ξn−ξm‖V =

∫| fn− fm|V dλ . This implies

that fn, n ∈ N is a Cauchy sequence in L1(V ·λ ) which is complete. Thus, thereexists f ∈ L1(V ·λ ) such that fn→ f in L1(V ·λ ). Setting ξ = f .λ , we obtain thatξ ∈MV (X ) and limn→∞ ‖ξn−ξ‖V = limn→∞

∫| fn− f |V dλ = 0. 2

Page 122: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.2 V -Dobrushin coefficient 169

Define now the set

M1,V (X ) = ξ ∈M1(X ),ξ (V )< ∞ . (7.5)

Define the distance dV on M1,V (X ) by

dV (ξ ,ξ′) =

12

∥∥ξ −ξ′∥∥

V . (7.6)

Proposition 7.3 yields the following corollary which will be crucial in the next Sec-tion.

Corollary 7.4 The space (M1,V (X ),dV ) is complete.

7.2 V -Dobrushin coefficient

We extend the definition of the Dobrushin coefficient by replacing the total variationdistance by the V -distance.

Definition 7.5 (V -Dobrushin Coefficient) Let P be a Markov kernel on (X,X )such that, for every ξ ∈M1,V (X ), ξ P∈M1,V (X ). The V -Dobrushin coefficient ofthe Markov kernel P, denoted ∆V (P), is defined by

∆V (P) = supξ 6=ξ ′∈M1,V (X )

dV (ξ P,ξ ′P)dV (ξ ,ξ ′)

= supξ 6=ξ ′∈M1,V (X )

‖ξ P−ξ ′P‖V‖ξ −ξ ′‖V

(7.7)

Contrary to the Dobrushin coefficient, the V -Dobrushin coefficient is not neces-sarily finite, unless the function V is bounded. An equivalent expression of the V -Dobrushin coefficient similar to (6.16) is available.

Lemma 7.6 Let P be a Markov kernel on (X,X ). Then,

∆V (P) = suposcV ( f )≤1

oscV (P f ) = sup(x,y)∈X×X

‖P(x, ·)−P(y, ·)‖VV (x)+V (y)

. (7.8)

Proof. Since∥∥δx−δy

∥∥V = V (x)+V (y), the right-hand side of (7.8) is obviously

less than or equal to ∆V (P). By Theorem 7.2 and (7.4), we have, for all probabilitiesξ , ξ ′,∥∥ξ P−ξ

′P∥∥

V = supf :oscV ( f )≤1

|ξ P( f )−ξ′P( f )|

= supf :oscV ( f )≤1

|ξ (P f )−ξ′(P f )| ≤

∥∥ξ −ξ′∥∥

V supf :oscV ( f )≤1

oscV (P f ) .

To conclude, we apply again Theorem 7.2 to obtain

Page 123: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

170 7 V-geometrically ergodic Markov chains

suposcV ( f )≤1

oscV (P f ) = suposcV ( f )≤1

supx,y

|P f (x)−P f (y)|V (x)+V (y)

= supx,y

suposcV ( f )≤1

|[P(x, ·)−P(y, ·)] f |V (x)+V (y)

= supx,y

‖P(x, ·)−P(y, ·)‖VV (x)+V (y)

.

2

Note that, mutatis mutandis, this proof is exactly the same as the proof of Lemma 6.18.We have seen in Corollary 7.4 that (M1,V (X ),dV ) is complete and that conver-

gence with respect to the V -distance implies weak convergence. Obviously, δx ∈M1,V (X ) for all x ∈ X, and ∆V (P) < ∞ implies by definition that ξ P ∈M1,V (X )for all ξ ∈ M1,V (X ). Thus we can apply Theorem 6.14 to obtain the followingconvergence result.

Theorem 7.7. Let P be a Markov kernel on (X,X ) such that ∆V (P) < ∞ and forsome integer m, ∆V (Pm)≤ ρ < 1. Then, P admits a unique invariant probability π ,satisfying π(V )< ∞. In addition, for all ξ ∈M1,V (X ),

‖ξ Pn−π‖V ≤(

max1≤r≤m−1

∆V (Pr)

)‖ξ −π‖V ρ

bn/mc .

7.3 Drift and Minorization conditions

We now introduce conditions that ensure that the Markov kernel P (or an iterate Pm)is a strict contraction in the V -distance, i.e. ∆V (Pm)< 1.

Definition 7.8 (Geometric Drift Condition) A Markov kernel P satisfies a geo-metric (or Foster-Lyapunov) drift condition (called D(V,λ ,b)) if there exist a mea-surable function V : X→ [1, ∞) and constants (λ ,b) ∈ (0,1)×R+ such that

PV ≤ λV +b . (7.9)

The function V in (7.9) is called a drift or test or Lyapunov function. The drift con-dition D(V,λ ,b) implies the condition D(aV,λ ,ab) for all a > 0. Therefore, withoutloss of generality and to avoid trivialities, we will assume whenever convenient thatminXV = 1. In that case, Condition D(V,λ ,b) implies that λ +b≥ 1.

It is sometimes easier to check the following conditions which imply (7.9):

limsupR→∞

supV (x)≥R

PV (x)V (x)

< 1 ,

for all R > 0, supV (x)≤R

PV (x)< ∞ .

Page 124: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.3 Drift and Minorization conditions 171

As we will see in the examples below, it is sometimes easier to check the ge-ometric drift condition on an iterate Pm of the kernel rather than on the kernel Pitself. The following proposition shows it is equivalent to check that P or Pm satisfya geometric drift condition.

Lemma 7.9 Let P be a Markov kernel satisfying the drift condition D(V,λ ,b).Then, for each positive integer m,

PmV ≤ λmV +

b(1−λ m)

1−λ≤ λ

mV +b

1−λ. (7.10)

Conversely, if for some m ≥ 2, Pm satisfies a geometric drift condition with driftfunction Vm and constants (λm,bm) ∈ [0,1)×R+, then P satisfies a geometric driftcondition with drift function Vm + λ

−1/mm PVm + · · ·+ λ

−(m−1)/mm Pm−1Vm and con-

stants λ1/mm and λ

−(m−1)/mm bm.

Proof. Assume that PV ≤ λV +b with λ ∈ [0,1). By straightforward induction, weobtain, for r ≥ 1,

PrV ≤ λrV +b

r−1

∑k=0

λk ≤ λ

rV +b(1−λr)/(1−λ ) .

This proves the first part.Conversely, if PmVm≤ λmVm+bm, set V =Vm+λ

−1/mm PVm+· · ·+λ

−(m−1)/mm Pm−1Vm.

Then,

PV = PVm +λ−1/mm P2Vm + · · ·+λ

−(m−1)/mm PmVm

≤ PVm +λ−1/mm P2Vm + · · ·+λ

−(m−2)/mm Pm−1Vm

+λ−(m−1)/mm (λmVm +bm) ,

= λ1/mm V +λ

−(m−1)/mm bm .

2

A simple corollary is a bound for the V -Dobrushin coefficients of the the iterates ofthe kernel.

Corollary 7.10 Assume that P satisfies the D(V,λ ,b) drift condition. Then, forany r ∈ N∗,

∆V (Pr)≤ λr +b

1−λ r

1−λ. (7.11)

Proof. For all x,x′ ∈ X, we have

‖Pr(x, ·)−Pr(x′, ·)‖VV (x)+V (x′)

≤ PrV (x)+PrV (x′)V (x)+V (x′)

≤ λr +

2b(1−λ r)

(1−λ )V (x)+V (x′).

Page 125: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

172 7 V-geometrically ergodic Markov chains

The bound (7.11) follows by noting that V (x)+V (x′)≥ 2 and applying Lemma 7.6.2

The drift condition alone does not guarantee the existence of an invariant distri-bution. However if one exists, then it integrates the Lyapunov function V .

Lemma 7.11 If Condition D(V,λ ,b) holds and if π is an invariant probability mea-sure for P, then π(V )< ∞.

Proof. If π is P invariant, applying the bound (7.10) of Lemma 7.9 and the concavityof the function x→ x∧ c yields, for all n ∈ N and c > 0,

π(V ∧ c) = πPn(V ∧ c)≤ π(PnV∧ c)≤ π(λ nV +b/(1−λ )∧ c) .

Letting n and then c tend to infinity yields π(V )≤ b/(1−λ ). 2

In order to prove the existence of an invariant measure, we need to introduceDoeblin sets.

Definition 7.12 (Doeblin set) Let m≥ 1 be an integer and ε > 0. A set C ∈X is a(m,ε)-Doeblin set if for every (x,x′) ∈C×C,

[Pm(x, ·)∧Pm(x′, ·)](X)≥ ε . (7.12)

In words, this condition means that, for every x,x′ ∈ C, the two probabilitiesPm(x, ·) and Pm(x′, ·) have a component in common, whose mass is at least ε .

Remark 7.13 Assume that ‖Pn(x, ·)−π‖TV ≤ V (x)εn, where limn→∞ εn = 0 anddefine C = V ≤ d. Fix some η > 0. Then, for all sufficiently large m andx,x′ ∈ C, it holds that ‖Pm(x, ·)−π‖TV ≤ η and ‖Pm(x′, ·)−π‖TV ≤ η so that‖Pm(x, ·)−Pm(x′, ·)‖TV≤ 2η . By Proposition 6.9, this yields that C is a (m,1−2η)-Doeblin set. This shows that V -ergodicity at any rate implies the existence of an(m,ε)-Doeblin set.

Similarly to Lemma 6.24, we have the following sufficient condition for (7.12).

Lemma 7.14 If there exist m ∈ N∗, ε > 0 and ν ∈ M1(X ) satisfying, for every(x,x′) ∈C and A ∈X

Pm(x,A)≥ εν(A) , (7.13)

then C is a (m,ε)-Doeblin set.

Remark 7.15 A set satisfying (7.13) is called a small set. We will see that small setsexist under fairly general conditions; Section 11.1.

7.4 Quantitative bounds for the V -Dobrushin coefficient

Page 126: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.4 Quantitative bounds for the V -Dobrushin coefficient 173

Let C be a (1,ε)-Doeblin set for the Markov kernel P on X×X . Let the Markovkernels R and Q on (C×C)×X be defined for (x,x′) ∈C×C and A ∈X by

R(x,x′;A) =[P(x, ·)∧P(x′, ·)](A)[P(x, ·)∧P(x′, ·)](X)

, (7.14)

Q(x,x′;A) = (1− ε)−1P(x,A)− εR(x,x′;A). (7.15)

The kernel R is well defined since the Doeblin condition implies that [P(x, ·)∧P(x′, ·)](X)≥ ε . Then, for all x,x′ ∈C×C and A ∈X ,

P(x,A) = εR(x,x′;A)+(1− ε)Q(x,x′;A) , (7.16)

and since the kernel R is symmetric, it also holds that

P(x,A)−P(x′,A) = (1− ε)Q(x,x′;A)−Q(x′,x,A) . (7.17)

For β > 0, defineVβ = 1+β (V −1) . (7.18)

Lemma 7.16 Let C be a (1,ε)-Doeblin set for P. Then, for every (x,x′)∈C×C andβ ∈ (0,1),

‖δxP−δx′P‖Vβ≤ 2(1− ε)(1−β )+2β sup

x∈CPV (x) .

Proof. Let f ∈ F(X,X ) be such that | f |Vβ≤ 1, i.e. | f (x)| ≤ 1−β +βV (x) for all

x ∈ X. Equation (7.16)) implies that QV (x,x′) ≤ (1− ε)−1PV (x) for all x,x′ ∈ C.This and (7.17) yield, for all x,x′ ∈C,

|P f (x)−P f (x′)|= (1− ε)

∣∣∣∣∫ Q(x,x′;dz)−Q(x′,x;dz) f (z)∣∣∣∣

≤ (1− ε)∫Q(x,x′;dz)+Q(x′,x;dz)(1−β )+βV (z)

= 2(1− ε)(1−β )+β (1− ε)QV (x′,x)+QV (x,x′)≤ 2(1− ε)(1−β )+βPV (x)+PV (x′)≤ 2(1− ε)(1−β )+2β sup

x∈CPV (x) .

2

Proposition 7.17 Let P be a Markov kernel satisfying the drift condition D(V,λ ,b).Assume moreover that for some d > 2b/(1−λ )−1, the level set V ≤ d is a (1,ε)-Doeblin set. Then, for all β ∈ (0,ε/(b+λd−1+ ε)+∨1),

∆Vβ(P)≤ γ1(β ,b,λ ,ε)∨ γ2(β ,b,λ )< 1 ,

with

Page 127: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

174 7 V-geometrically ergodic Markov chains

γ1(β ,b,λ ,ε) = 1− ε +β (b+λd + ε−1) , (7.19)

γ2(β ,b,λ ) = 1−β(1+d)(1−λ )−2b2(1−β )+β (1+d)

. (7.20)

Proof. Set C = V ≤ d. The drift condition D(V,λ ,b) implies the bound supx∈C PV (x)≤b+λd. This bound and Lemma 7.16 together imply that for all (x,x′) ∈C×C

‖δxP−δx′P‖Vβ≤ 2(1− ε)(1−β )+β (b+λd)

= 2γ1(β ,b,λ ,ε)≤ γ1(β ,b,λ ,ε)Vβ (x)+Vβ (x′) . (7.21)

Note that γ1(β ,b,λ ,ε)< 1 for all β ∈ (0,ε/(b+λd−1+ ε)+∨1).Consider now the case (x,x′) 6∈C×C, which implies V (x)+V (x′)> (1+d). The

function u 7→ (2(1−β )+λβu+ 2βb)/(2(1−β )+βu) is decreasing on R+ from1+βb/(1−β ) to λ and it is equal to 1 if u = 2b/(1−λ ). Thus,

supu≥1+d

2(1−β )+λβu+2βb2(1−β )+βu

=2(1−β )+λβ (1+d)+2βb

2(1−β )+β (1+d)= γ2(β ,b,λ )

(7.22)Thus we see that γ2(β ,b,λ ) > 0 and that γ2(β ,b,λ ) < 1 if and only if 1+ d >2b/(1−λ ). By definition of the norm ‖·‖Vβ

and by the drift condition D(V,λ ,b),we have

‖δxP−δx′P‖Vβ≤ ‖δxP‖Vβ

+‖δx′P‖Vβ

= 2(1−β )+βPV (x)+βPV (x′)

≤ 2(1−β )+βλV (x)+V (x′)+2βb .

Using (7.22) with u =V (x)+V (x′)> 1+d, we obtain

‖δxP−δx′P‖Vβ≤ γ2(β ,b,λ )

Vβ (x)+Vβ (x

′). (7.23)

The bounds (7.21) and (7.23) yield, for all x,x′ ∈ X

‖δxP−δx′P‖Vβ≤ γ1(β ,b,λ ,ε)∨ γ2(β ,b,λ )

Vβ (x)+Vβ (x

′),

and the proof is concluded by applying Lemma 7.6. 2

Definition 7.18 (V-geometric ergodicity) Let V :X→ [1,∞) be a measurable func-tion. A Markov kernel P is said to be V -geometrically ergodic if there exists a proba-bility measure π such that π(V )< ∞ and constants C < ∞ and ρ ∈ [0,1), satisfying:for all n ∈ N and x ∈ X,

‖Pn(x, ·)−π‖V ≤CV (x)ρn . (7.24)

Note that (7.24) implies that π is an invariant probability for the Markov kernel P.Since ‖ξ‖TV = ‖ξ‖

1X, the concept of V -geometric ergodicity generalises the notion

Page 128: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.4 Quantitative bounds for the V -Dobrushin coefficient 175

of uniform ergodicity. The upper-bound for the V -norm of the difference betweenthe n-th iterate of the Markov chain Pn(x, ·) and the invariant probability π is allowedto depend on the starting point x through the function V .

Proposition 7.17 gives conditions under which P is a contraction on M1,V (X )for the distance dVβ

for some β small enough. The bound β ‖·‖Vβ≤ ‖·‖Vβ

≤ ‖·‖V

implies that ∆V (P) ≤ β−1∆Vβ(P), but for β < 1, this does not guarantee that P is

contracting with respect to the distance dV . However, by Theorem 7.7, the contrac-tivity property with respect to the dVβ

implies the Vβ -geometric ergodicity of P andwe will prove that Vβ -geometric ergodicity for any β ∈ (0,1) implies V -geometricergodicity.

Theorem 7.19. Let P be a Markov kernel satisfying the drift condition D(V,λ ,b).Assume moreover that for some d > 2b/(1− λ )− 1, m ∈ N∗ and ε ∈ (0,1),the level set V ≤ d is an (m,ε)-Doeblin set. Then P admits a unique invari-ant measure π and P is V -geometrically ergodic. More precisely, for any β ∈(0,ε/(bm +λ md−1+ ε)+∨1), n ∈ N and x ∈ X,

‖Pn(x, ·)−π‖V ≤ c(β )π(V )+V (x)ρbn/mc(β ) , (7.25)

with

bm =b

minx∈XV (x)1−λ m

1−λ, (7.26)

c(β ) = β−1(1−β )+λ

m +bm , (7.27)ρ(β ) = γ1(β ,bm,λ

m,ε)∨ γ2(β ,bm,λm) . (7.28)

and γ1 and γ2 are given in (7.19) and (7.20), respectively.

Proof. Assume without loss of generality that minx∈XV (x) = 1. The proof consistsin checking the conditions of Theorem 7.7.

By Lemma 7.9, the Markov kernel Pm satisfies the drift condition D(V,λ m,bm),i.e. PmV ≤ λ mV +bm. Also, V ≤ d is a (1,ε)-Doeblin set for Pm and d > 2b/(1−λ )−1 if and only if d > 2bm/(1−λ m)−1 since 2bm/(1−λ m)−1= 2b/(1−λ )−1.Thus we can apply Proposition 7.17 to Pm: for any β ∈ (0,ε/(bm +λ md−1+ ε)+∨1),

∆Vβ(Pm)≤ γ1(β ,bm,λ

m,ε)∨ γ2(β ,bm,λm)< 1 .

Next, applying Lemma 7.6 and Corollary 7.10, we obtain, for r ∈ 1, . . . ,m−1,

Page 129: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

176 7 V-geometrically ergodic Markov chains

∆Vβ(Pr) = sup

(x,x′)∈X×X

‖Pr(x, ·)−Pr(x′, ·)‖Vβ

Vβ (x)+Vβ (x′)

≤ sup(x,x′)∈X×X

PrVβ (x)+PrVβ (x′)Vβ (x)+Vβ (x′)

= sup(x,x′)∈X×X

2(1−β )+βPrV (x)+βPrV (x′)Vβ (x)+Vβ (x′)

≤ sup(x,x′)∈X×X

2(1−β )+βλ r(V (x)+V (x′))+2b(1−λ r)/(1−λ )2(1−β )+βV (x)+V (x′)

,

≤ (1−β )+βλ r +b(1−λr)/(1−λ ) .

The condition PV ≤ λV + b and minXV = 1 imply that 1 ≤ λ + b. This in turnimplies that the right-hand side of the previous equation is increasing w.r.t. r andthus is bounded by βc(β ) for all r = 1, . . . ,m. We now apply Theorem 7.7 to obtainthat there exists a unique invariant probability π , such that π(Vβ ) < ∞ and for anyn ∈ N? and ξ ∈M1,Vβ

(X ), we have

‖ξ Pn−π‖Vβ≤ max

1≤r<m∆Vβ

(Pr) ‖ξ −π‖Vβρbn/mc(β )

≤ βc(β )‖ξ −π‖Vβρbn/mc(β ) .

Since ‖·‖Vβ= (1−β )‖·‖TV+β ‖·‖V and ‖·‖TV ≤ ‖·‖V, we have β ‖·‖V ≤ ‖·‖Vβ

≤‖·‖V. Thus,

‖ξ Pn−π‖V ≤ β−1 ‖ξ Pn−π‖Vβ

≤ c(β ) ‖ξ −π‖V ρbn/mc(β ) .

Choosing ξ = δx yields (7.25). 2

7.5 Central Limit Theorem

In this section we establish a central limit theorem for additive functionals of aMarkov Chain

Sn( f ) :=n−1

∑k=0

f (Xk) , (7.29)

where f is a function belonging to some class of functions to be specified.

Lemma 7.20 Let P be a Markov kernel and π ∈M1(X ). Let V : X→ [1,∞) be ameasurable function. Assume that the Markov kernel P is V -geometrically ergodic.Then, for any α ∈ (0,1], P is V α -ergodic; more precisely,

‖Pn(x, ·)−π‖Vα ≤ 21−αCαρ

αnV α(x) , for all n ∈ N and x ∈ X , (7.30)

Page 130: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.5 Central Limit Theorem 177

where C and ρ are defined in (7.24).

Proof. By Jensen’s inequality, we get, for ξ ∈M0(X )∩MV (X ),

‖ξ‖Vα = |ξ |(X) |ξ |(Vα)

|ξ |(X)≤ |ξ |(X)

(|ξ |(V )

|ξ |(X)

= |ξ |1−α(X)‖ξ‖α

V .

The result follows by applying this identity to ξ = Pn(x, ·)− π and noting that|ξ |(X)≤ 2. 2

Lemma 7.21 Assume that P is V -geometrically ergodic (with constants C and ρ).Then,

(i) If | f |V1/2 < ∞ and π( f ) = 0, then∣∣∣Pk f∣∣∣L2(π)

≤ (2Cπ(V ))1/2 | f |V1/2 ρk/2 .

(ii) If for some δ > 0, f ∈ L2+δ

0 (π), then∣∣∣Pk f∣∣∣L2(π)

≤(8Cπ(V ))1/2 +2 | f |1+δ/2

L2+δ (π)

ρ

δk/(4+2δ ) .

Proof. By Lemma 7.20, setting W =V 1/2, K = 21/2C1/2 and γ = ρ1/2, ‖Pn(x, ·)−π‖W≤KW (x)γn, for every x ∈ X and n ∈ N.

i) For f satisfying | f |W < ∞ and π( f ) = 0, we get

|Pk f (x)| ≤ ‖δxPn−π‖W | f |W ≤ KW (x) | f |W γn ,

showing that∣∣Pk f

∣∣L2(π)

≤ K |W |L2(π) | f |W γn.ii) We have, for M > 1, f = gM +hM where

gM = f1| f |≤M−π( f 1| f |≤M), hM = f1| f |>M+π( f1| f |≤M) .

Since π(gM)= 0, |gM|∞≤ 2M and |gM|W≤ |gM|∞, we get |PkgM(x)| ≤ 2KMW (x)γk,which implies ∣∣∣PkgM

∣∣∣L2(π)

≤ 2KM |W |L2(π) γk .

Note that∣∣ f1| f |>M

∣∣L1(π)

≤∣∣ f1| f |≥M

∣∣L2(π)

. On the other hand, since M > 1, we

get Mδ π( f 21| f |>M)≤ π(| f |2+δ ), showing that∣∣ f1| f |>M

∣∣L2(π)

≤M−δ/2 | f |1+δ/2L2+δ (π)

.

Since π( f )= 0, we get π( f1| f |≤M)=−π( f1| f |>M), which implies |π( f1| f |≤M)| ≤∣∣ f1| f |>M∣∣L2(π)

. By combining these two inequalities, we therefore obtain

Page 131: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

178 7 V-geometrically ergodic Markov chains∣∣∣PkhM

∣∣∣L2(π)

≤ |hM|L2(π) ≤ 2M−δ/2 | f |1+δ/2L2+δ (π)

.

Hence we have for all k and M > 1∣∣∣Pk f∣∣∣L2(π)

≤∣∣∣PkgM

∣∣∣L2(π)

+∣∣∣PkhM

∣∣∣L2(π)

≤ 2KM |W |L2(π) γk +2M−δ/2 | f |1+δ/2

L2+δ (π).

Choosing M = γ−k/(1+δ/2), we obtain∣∣∣Pk f∣∣∣L2(π)

≤ 2K |W |L2(π) γδk/(2+δ )+2 | f |1+δ/2

L2+δ (π)γ

δk/(2+δ ) .

2

Theorem 7.22. Assume that the Markov kernel P is V -geometrically ergodic (Def-inition 7.18) with invariant distribution π . Then, for all ξ ∈ M1,V (X ) and f ∈F(X,X ) satisfying either | f |V1/2 < ∞ or f ∈ L2+δ (π) for some δ > 0, we get.

n−1/2n−1

∑k=0 f (Xk)−π( f )

=⇒ N(0,σ2( f )

), (7.31)

where

σ2( f ) = lim

n→∞Eπ

(n−1

∑k=0

f (Xk)−π( f )

)2 ,

= Varπ( f )+2∞

∑k=1

Covπ f (X0) f (Xk) .

Proof. If ξ = π , the proof follows from Lemma 5.33 and Lemma 7.21. The resultextends to arbitrary ξ ∈M1,V (X ) by Theorem 5.34. 2

7.6 Coupling bounds

In this section, we will obtain bounds for the rate of convergence by means of acoupling construction.

Let P be a Markov kernel on a state space X and ξ and ξ ′ ∈M1(X ). Assume thatthere exist ε ∈ (0,1) and a (1,ε)-Doeblin set C such that [P(x, ·)∧P(x′, ·)](X) ≥ ε

for every (x,x′) ∈ C×C. We are going to build a coupling of two Markov chainswith kernel P and initial distributions ξ and ξ ′.

Define the product space Z = X× X× 0,1 and the associated product σ -field X ⊗X ⊗0,1. We will define on the space (ZN,Z ⊗N) a Markov chain(Xn, X ′n,Dn),n ≥ 0. Here Dn is called a bell variable: it indicates whether thecoupling of Xn, n ∈ N and X ′n, n ∈ N has occurred by time n (Dn = 1) or not

Page 132: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.6 Coupling bounds 179

(Dn = 0). Let R and Q be the Markov kernels on C×C×X defined in (7.14) and(7.15) and let P be any Markov kernel on (X×X,X ⊗X ) with the following prop-erties: for every A ∈X ,

P(x,x′;A×X) = P(x′,x;X×A) = P(x,A) if (x,x′) 6∈C×C , (7.32a)P(x,x′;A×X) = P(x′,x;X×A) = Q(x,x′;A) if (x,x′) ∈C×C . (7.32b)

An example of such a kernel P is defined for A,B ∈X byP(x,x′;A×B) = P(x,A)P(x′,B) if (x,x′) 6∈C×C ,

P(x,x′;A×B) = Q(x,x′;A)Q(x′,x;B) if (x,x′) ∈C×C .

This is just one of many possibilities and not necessarily the most appropriate onefor specific examples.

Set D0 = 0 and sample random variables (X0, X ′0) in such a way that X0 ∼ ξ andX ′0 ∼ ξ ′ arbitrarily. Then, inductively for k = 1,2, . . . , given Xk and X ′k, we constructXk+1 and X ′k+1 by:

1. If Dk = 1 (the processes are coupled), then sample Xk+1 ∼ P(Xk, ·) and setX ′k+1 = Xk+1 and Dk+1 = 1.

2. If Dk = 0, then:

(a) If (Xk, X ′k) ∈ C×C, then draw an independent Bernoulli random variableDk+1, with probability of success ε , and according to the result of thisBernoulli trial, take one of the following action:(i) If Dk+1 = 1 (the attempt to couple the two processes Xk, k ∈ N andX ′k, k ∈N is successful), sample Xk+1 conditionally independently of(X`, X ′`), ` < k given (Xk, X ′k) from the Markov kernel R

Xk+1 ∼ R(Xk, X ′k; ·) ,

and set X ′k+1 = Xk+1.(ii) If Dk+1 = 0 (coupling has not occurred), sample (Xk+1, X ′k+1) condi-

tionally independently of (X`, X ′`), ` < k given (Xk, X ′k) from the ker-nel P

(Xk+1, X ′k+1)∼ P(Xk, X ′k; ·) .

(b) If (Xk, X ′k) 6∈C×C, then we set Dk+1 = 0 and sample (Xk+1, X ′k+1) condi-tionally independently of (X`, X ′`), ` < k given (Xk, X ′k) from the kernel P

(Xk+1, X ′k+1)∼ P(Xk, X ′k+1; ·) .

By construction (Xn, X ′n,Dn),n ∈ N is a Markov chain. Denote by Pξ⊗ξ ′⊗δ0

the distribution of this Markov chain on (ZN,Z ⊗N) and by Eξ⊗ξ ′⊗δ0the associated

Page 133: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

180 7 V-geometrically ergodic Markov chains

expectation operator. By construction, the two processes Xn, n ∈ N and X ′n, n ∈N are Markov chains with initial distribution ξ and ξ ′.

Proposition 7.23 For all n ∈ N, and A ∈X ,

Pξ⊗ξ ′⊗δ0

(Xn+1 ∈ A

∣∣F Xn

)= P(Xn,A) , (7.33)

Pξ⊗ξ ′⊗δ0

(X ′n+1 ∈ A

∣∣F X ′n

)= P(X ′n,A) . (7.34)

Proof. We only prove (7.33). Define Fn = σ((Xk, X ′k,Dk),k ≤ n). By construction,we have

Pξ⊗ξ ′⊗δ0

(Xn+1 ∈ A

∣∣Fn)

= P(Xn,A)1Dn=1

+1Dn=01C×C(Xn, X ′n)

εR(Xn, X ′n,A)+(1− ε)P(Xn, X ′n,A×X)

+1Dn=01(C×C)c(Xn, X ′n)P(Xn, X ′n,A×X) .

Recalling the relation (7.16) between the kernels P, Q and R and the property (7.32a)of the kernel P, we obtain

Pξ⊗ξ ′⊗δ0

(Xn+1 ∈ A

∣∣Fn)

= P(Xn,A)1Dn=1+1Dn=01C×C(Xn, X ′n)+1Dn=01(C×C)c(Xn, X ′n)= P(Xn,A) .

2

The coupling construction allows to get estimates of ‖ξ Pn−ξ ′Pn‖V. The nextresults parallels Theorem 6.26 with the total variation distance replaced by thenorm ‖·‖V.

Theorem 7.24. Assume that the transition kernel P admits a (1,ε)-Doeblin set.Then for any initial distributions ξ ,ξ ′ ∈M1(X ) and measurable function V : X→[1,∞), ∥∥ξ Pn−ξ

′Pn∥∥V ≤ 2Eξ⊗ξ ′⊗δ0

[V (Xn, X ′n)1Dn=0] , (7.35)

where V is any measurable function on X×X such that, for all (x,x′) ∈ X×X,

V (x)+V (x′)2

≤ V (x,x′) . (7.36)

Remark 7.25 A possible choice for V is V (x,x′) = V (x) +V (x′)/2, but otherchoices may lead to better bounds in (7.35).

Proof. Let f ∈ F(X,X ) be a function such that | f |V < ∞. Since Xn = X ′n when-ever Dn = 1, [ f (Xn)− f (X ′n)]1Dn=1 = 0. On the other hand, by definition of thenorm ‖·‖V and by assumption on he function V ,

Page 134: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.6 Coupling bounds 181

| f (Xn)− f (X ′n)|1Dn=0 ≤ 2V (Xn, X ′n)1Dn=0

Hence, using Proposition 7.23, we get

|ξ Pn f −ξ′Pn f |= |Eξ⊗ξ ′⊗δ0

[ f (Xn)− f (X ′n)]|= |Eξ⊗ξ ′⊗δ0

[( f (Xn)− f (X ′n))1Dn=0]|≤ 2Eξ⊗ξ ′⊗δ0

[V (Xn, X ′n)1Dn=0] .

2

For any probability measures ξ ,ξ ′ ∈M1(X ), let Pξ⊗ξ ′ be the probability mea-sure on the canonical space (XN×XN,X N⊗X N) associated to the initial measureξ ⊗ξ ′ and the transition kernel P defined in (7.32) and let (Xn,X ′n),n ∈ N be thecoordinate process on XN×XN.

Proposition 7.26 Assume that C is a (1,ε)-Doeblin set. Let ξ ,ξ ′ ∈M1(X ) andV : X→ [1,∞) be a measurable function. Then, for n≥ 0,∥∥ξ Pn−ξ

′Pn∥∥V ≤ 2Eξ⊗ξ ′ [V (Xn,X ′n)(1− ε)ηn−1 ] , (7.37)

where

ηn =n

∑k=0

1C×C(Xk,X ′k) , (7.38)

is the number of visits to C×C before n (with the convention that η−1 = 0) and Vsatisfies (7.36).

Remark 7.27 It must be noted that Pξ⊗ξ ′ is not a coupling of the Markov chainswith kernel P and initial distributions ξ and ξ ′ and that ηn is not a coupling time.

Proof. Using Theorem 7.24, it suffices to prove that

Eξ⊗ξ ′⊗δ0[V (Xn, X ′n)1Dn=0] = Eξ⊗ξ ′ [V (Xn,X ′n)(1− ε)ηn−1 ] .

We shall prove by induction that for any n ≥ 0 and any sequence of positive mea-surable functions f j j≥0,

∆n := Eξ⊗ξ ′⊗δ0

[n

∏j=0

f j(X j, X ′j)1Dn=0

]= Eξ⊗ξ ′

[n

∏j=0

f j(X j,X ′j)(1− ε)ηn−1

].

(7.39)This is true for n = 0. The key ingredient is to note that by construction of thekernel P, for a bounded measurable function f defined on X×X,

Eξ⊗ξ ′⊗δ0

[f (Xn+1, Xn+1)1Dn+1=0

∣∣Fn]1Dn=0 = P f (Xn, X ′n)1Dn=0 ,

where Fn is a shorthand notation for F(X ,X ′,D)n . This yields,

Page 135: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

182 7 V-geometrically ergodic Markov chains

∆n+1 = Eξ⊗ξ ′⊗δ0

[n+1

∏j=0

f j(X j, X ′j)1Dn+1=0

]

= Eξ⊗ξ ′⊗δ0

[n

∏j=0

f j(X j, X ′j)E[

fn+1(Xn+1, X ′n+1)1Dn+1=0∣∣Fn

]1Dn=0

]

= Eξ⊗ξ ′⊗δ0

[n

∏j=0

f j(X j, X ′j)1− ε1C×C(Xn, X ′n)P fn+1(Xn, X ′n)1Dn=0

].

Applying the induction assumption with the function fn replaced by (1−ε)1C×C fnP fn,we obtain

∆n+1 = Eξ⊗ξ ′

[n

∏j=0

f j(X j,X ′j)P fn+1(Xn,X ′n)(1− ε)1C×C(Xn,X ′n)(1− ε)ηn−1

].

Note now that (1−ε)1C×C(Xn,X ′n)(1−ε)ηn−1 =(1−ε)ηn and that ηn is Fn-measurable.This yields

∆n+1 = Eξ⊗ξ ′

[n

∏j=0

f j(X j,X ′j)P fn+1(Xn,X ′n)(1− ε)ηn

]

= Eξ⊗ξ ′

[n+1

∏j=0

f j(X j,X ′j)(1− ε)ηn

].

This concludes the induction. 2

The next task is to obtain more explicit bounds for the expectation in the right-hand side of (7.37). We first achieve this when the function V satisfies a suitablebivariate drift condition.

Lemma 7.28 Assume that C is an (1,ε)-Doeblin set for the kernel P and

PV ≤ λV1(C×C)c + b1C×C . (7.40)

for some λ ∈ [0,1). Then, for each j ∈ 0, . . . ,n and (x,x′) ∈ X×X,

Eδx⊗δx′[(1− ε)ηn−1 ]≤ λ

nB j−1V (x,x′)+(1− ε) j , (7.41)

Eδx⊗δx′[V (Xn,X ′n)(1− ε)ηn−1 ]≤ λ

nB j−1V (x,x′)

+(1− ε) j(

λnV (x,x′)+

b(1− λ n)

1− λ

), (7.42)

whereB = 1∨ [b(1− ε)/λ ] . (7.43)

Proof. For j ∈ 0, . . . ,n and (x,x′) ∈ X×X, Lemma 7.9 yields

Page 136: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.6 Coupling bounds 183

Eδx⊗δx′[V (Xn,X ′n)(1− ε)ηn−11ηn−1≥ j]≤ (1− ε) jEδx⊗δx′

[V (Xn,X ′n)]

≤ [λ nV (x,x′)+ b(1− λn)/(1− λ )](1− ε) j . (7.44)

For k ≥ 0, define Zk = λ−k[(1− ε)/B]ηk−1V (Xk,X ′k). For k ≥ 1, since ηk−1 is

F(X ,X ′)k−1 -measurable, we obtain

Eδx⊗δx′[Zk|F

(X ,X ′)k−1 ] = λ

−kPV (Xk−1,X ′k−1)[(1− ε)/B]ηk−1

≤ λ−k+1V (Xk−1,X ′k−1)[(1− ε)/B]ηk−11(C×C)c(Xk−1,X ′k−1)

+ λ−kbV (Xk−1,X ′k−1)[(1− ε)/B]ηk−11C×C(Xk−1,X ′k−1) .

Using the relations ηk−1 = ηk−2 +1C×C(Xk−1,X ′k−1) and b(1− ε) ≤ Bλ , we findthat for all k ≥ 1,

Eδx⊗δx′[Zk|F

(X ,X ′)k−1 ]≤ Zk−1 .

This means that (Zk,F(X ,X ′)k ), k ∈N is a positive Pδx⊗δx′

-supermartingale. There-fore, for all n≥ 0, Eδx⊗δx′

[Zn]≤ Eδx⊗δx′[Z0] = V (x,x′). Since B≥ 1, this yields

Eδx⊗δx′[V (Xn,X ′n)(1− ε)ηn−11ηn−1< j]

≤ Eδx⊗δx′[V (Xn,X ′n)(1− ε)/Bηn−1 Bηn−11ηn−1< j]

≤ λnB jEδx⊗δx′

[Zn]≤ λnB j V (x,x′) . (7.45)

Gathering the bounds (7.44) and (7.45) yields (7.42). To obtain (7.41), replace (7.44)by the trivial bound Eδx⊗δx′

[(1− ε)ηn−11ηn−1≥ j]≤ (1− ε) j. 2

For probability measures ξ ,ξ ′ ∈M1(X ), integrating the bounds (7.41) and (7.42)yields

Eξ⊗ξ ′ [(1− ε)ηn−1 ]≤ λnB j−1

ξ ⊗ξ′(V )+(1− ε) j , (7.46)

Eξ⊗ξ ′ [V (Xn,X ′n)(1− ε)ηn−1 ]≤ λnB j−1

ξ ⊗ξ′(V )

+(1− ε) j(

λnξ ⊗ξ

′(V )+b(1− λ n)

1− λ

). (7.47)

If P has an invariant distribution π , then we can choose ξ ′ = π and 7.26 and 7.28yield bounds for ‖ξ Pn−π‖TV and ‖ξ Pn−π‖V, for appropriately defined func-tions V .

Theorem 7.29. Assume that C is an (1,ε)-Doeblin set for the kernel P. Let V bea function defined on X×X such that (7.40) holds for some λ ∈ [0,1). Assume inaddition that P admits an invariant distribution π . Let V : X→ [1,∞) be a functionsatisfying V (x)+V (x′) ≤ 2V (x,x′) for all (x,x′) ∈ X×X. Then, for all x ∈ X andj ∈ 0, . . . ,n,

Page 137: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

184 7 V-geometrically ergodic Markov chains

‖ξ Pn−π‖TV ≤ 2(1− ε) j +2λnB j−1

ξ ⊗π(V ) ,

‖ξ Pn−π‖V ≤ 2(1− ε) j(

λnξ ⊗π(V )+

b1− λ

)+2λ

nB j−1ξ ⊗π(V ) .

On a logarithmic scale, the constants disappear and the rate of convergence is thesame in TV norm and in V norm.

Corollary 7.30 Under the assumptions of Theorem 7.29,

limn→∞

supn−1 log‖Pn(x, ·)−π‖V ≤

− log(λ ) log(1−ε)

log(b)−log(λ )if b(1− ε)/λ ≥ 1,

log(λ ) otherwise.

Proof. Assume first that b(1− ε) ≥ λ , which implies that B = b(1− ε)/λ . Then,defining j by

j =⌊− log(λ )n

log(b)− log(λ )

⌋yields the first case. If b(1− ε)λ−1 < 1, then B = 1 and since Condition (7.40)implies that b ≥ 1, it also holds that 1− ε < λ . Then, choosing j = n yields thesecond case.

2

In many applications, the drift condition (7.40) is derived from a single variatedrift condition on the kernel P. We now show that the univariate drift conditionD(V,b,λ ) translates into the bivariate drift condition (7.40) for a particular choiceof the function V .

Lemma 7.31 Assume that P satisfies the D(V,λ ,b) drift condition and that for somed > 0 the level set V ≤ d is a (1,ε)-Doeblin set. If V (x,x′)= (1/2)V (x)+V (x′),then the drift condition (7.40) holds with C = V ≤ d and

λ = λ +2b

(1+d), b =

λd +b1− ε

.

Proof. If (x,x′) 6∈C×C, then V (x,x′)≥ (1+d)/2. Then,

PV (x,x′) =12

PV (x)+12

PV (x′)≤ λV (x,x′)+b

≤ λV (x,x′)+b2

1+dV (x,x′) . (7.48)

If (x,x′)∈C×C, then V (x,x′)≤ d. By definition of the kernels Q and P, this implies

Page 138: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.6 Coupling bounds 185

PV (x,x′) = QV (x,x′) = (1− ε)−1[PV (x)+PV (x′)/2− εRV (x,x′)]

≤ 12(1− ε)

PV (x)+PV (x′) ,

≤ λ

1− εV (x,x′)+

b1− ε

≤ λd +bd1− ε

. (7.49)

Gathering (7.48) and (7.49) yields (7.40). 2

We now have all the ingredients to give explicit bounds for the rate of con-vergence of ξ Pn to the invariant probability measure π under the drift conditionD(V,b,λ ) and the assumption that a certain level set of V is a Doeblin set.

Theorem 7.32. Assume that P satisfies the D(V,λ ,b) drift condition and that, forsome d > 2b/(1−λ )−1, m ∈ N∗ and ε ∈ (0,1), the level set V ≤ d is a (m,ε)-Doeblin set. Then P admits a unique invariant measure π and is V -geometricallyergodic:

‖ξ Pn−π‖V ≤(

ξ (V )+π(V )+b

1−λ+

bm

1− λm

)ρbn/mc (7.50)

with

ρ =

exp − log(λm) log(1−ε)

log(bm)−log(λm)if bm(1− ε)/λm ≥ 1,

λm otherwise.

where

λm = λm +

2bm

1+d< 1 , (7.51)

bm =λ md +bm

1− ε(7.52)

Bm = 1∨ [bm(1− ε)/λm] . (7.53)

Remark 7.33 It always holds that ρ ≤ λm. It may happen that ρ < λm if ε is suffi-ciently large. We can also express ρ = λ a with

a =

(− log(1− ε)

log(bm)− log(λm)

)∨1 .

Proof. The existence and uniqueness of the stationary distribution follow from The-orem 7.19. The proof then consists in putting together all the pieces and tracking theconstants.

• By Lemma 7.9, the Markov kernel Pm satisfies the drift condition D(V,λ m,bm).• Lemma 7.31 shows that the drift condition (7.40) holds with constants λm

and bm.

Page 139: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

186 7 V-geometrically ergodic Markov chains

• Since 2bm/(1− λ m)− 1 = 2b/(1− λ )− 1, V ≤ d is a (1,ε)-Doeblin setfor Pm, for d > 2bm/(1−λ m)−1.

• We may therefore apply Theorem 7.29 to the kernel Pm, the set C = V ≤ dand the with constants λm, Bm and bm.

This yields, for k ≥ 0, r = 0, . . . ,m−1 and j = 0, . . . ,n,∥∥∥ξ Pkm+r−π

∥∥∥V≤ 2[ξ Pr⊗π(V )]λ k

mB jm +2

(ξ Pr⊗π(V )+

bm

1− λm

)(1− ε) j ,

Note that 2ξ Pr⊗π(V ) = ξ Pr(V )+π(V ) and by Lemma 7.9,

ξ Pr(V )≤ ξ (V )+b

1−λ.

This yields, for all n≥ 1 and j = 0, . . . ,bn/mc,

‖ξ Pn−π‖V ≤ [ξ (V )+π(V )+b

1−λ]λbn/mcm B j

m

+

(ξ (V )+π(V )+

b1−λ

+bm

1− λm

)(1− ε) j .

Choosing the optimal j as in the proof of Corollary 7.30 yields (7.50). 2

7.7 Coupling for stochastically monotone Markov chains

Let X be a totally ordered set, and denote the order relation. For a ∈ X, denote(−∞,a] = x ∈ X : x a and [a,∞) = x ∈ X : a x. A Markov kernel P on Xis called stochastically monotone if for all a ∈ X, the map x→ P(x,(−∞,a]) is nonincreasing. In different words, if x x′, then a random variable X with distributionP(x, ·) is stochastically dominated by a random variable Y with distribution P(x′, ·);we denote X PY the stochastic domination of X by Y . If X and Y are defined on thesame probability space, X PY , then a particular realization of X may happen to belarger than the corresponding realization of Y . But there exists a random variable Y ′

(perhaps defined on an enlarged probability space), with the same distribution as Yand such that X ≤ Y ′ P-a.s., and the converse is also true.

If P is a stochastically monotone Markov kernel, it is possible to define the bi-variate kernel P in such a way that the two components Xn, n∈N and X ′n, n∈Nare pathwise ordered, i.e. their initial order is preserved at all times.

Let K be a Markov kernel on X and for each x ∈ X, let GK(x, ·) be the quantilefunction associated to the probability measure K(x, ·), i.e.

GK(x,u) = infy ∈ X,K(x,(−∞,y])≥ u . (7.54)

Page 140: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.7 Coupling for stochastically monotone Markov chains 187

If K is stochastically monotone, then, for each u ∈ (0,1), the map x→ G(x,u) isnondecreasing.

Let C be a (1,ε)-Doeblin set with associated probability measure ν and let Q bethe residual kernel on X×X defined by

Q(x,A) =P(x,A)− εν(A)

1− ε. (7.55)

Define now the Markov kernel P on X2×X ⊗2 as follows. For (x,x′) ∈ X×X andA ∈X ⊗X ,

P((x,x′),A) =

∫ 10 1A(GP(x,u),GP(x′,u))du if (x,x′) /∈C×C ,∫ 10 1A(GQ(x,u),GQ(x′,u))du if (x,x′) ∈C×C ,

By construction, the set ∆ = (x,x′) ∈ X×X : x x′ is absorbing for the kernel P,i.e. for all (x,x′) ∈ ∆ , P(x,x′,∆) = 1. Indeed, for each u ∈ [0,1], the functions x 7→GP(x,u) and x 7→ GQ(x,u) are non decreasing. Thus, if x x′, we have

P(x,x′,∆) = 1(C×C)c(x,x′)∫ 1

01GP(x,u)≤GP(x′,u) du

+1C×C(x,x′)∫ 1

01GR(x,u)≤GR(x′,u) du

= 1(C×C)c(x,x′)+1C×C(x,x′) = 1 .

This result can be expressed in terms of the chain (Xk,X ′k),k ∈ N wth transitionkernel PP: if x x′, then for all k ≥ 1, Xk X ′k, Px,x′ -a.s..

Lemma 7.34 Let V : X→ [1,∞) be a nondecreasing function. Assume that the ker-nel P satisfies the drift condition D(V,λ ,b) and that there exists x0 ∈ X such that(−∞,x0] is a (1,ε)-Doeblin set and V (x0)> b/(1−λ ). Define V (x,x′) =V (x∨ x′).Then the kernel P satisfies the drift condition

PV ≤ λV1(C×C)c + b1C×C ,

with

λ = λ +b

V (x0), b =

λ supC V +b1− ε

.

Proof. Assume that x x′. Using the fact that ∆ is absorbing for P, we obtain, if(x,x′) /∈C×C,

PV (x,x′) = Ex,x′ [V (X1∨X ′1)] = Ex,x′ [V (X ′1)]

= PV (x′)≤ λV (x′)+b≤ (λ +b/d)V (x′) = (λ +b/d)V (x,x′) .

If (x,x′) ∈C×C, then

Page 141: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

188 7 V-geometrically ergodic Markov chains

PV (x,x′) = Ex,x′ [V (X ′1)] = QV (x′)≤ (1− ε)−1PV (x′)≤ λV (x′)+b1− ε

.

2

Theorem 7.35. Assume that P is a stochastically montone Markov kernel and thatthe D(V,λ ,b) drift condition holds. Assume that there exists d > 2b/(1−λ )−1 andε ∈ (0,1) such that the level set V ≤ d is a (1,ε)-Doeblin set. Then P admits aunique invariant measure π and is V -geometrically ergodic. There exists a constantC such that

‖ξ Pn−π‖V ≤Cρn (7.56)

with

ρ =

exp − log(λ ) log(1−ε)

log(b)−log(λ )if b(1− ε)/λ ≥ 1 ,

λ otherwise.

7.8 Examples

7.8.1 Discrete state space examples

7.8.1.1 INAR process

Consider the INAR or Galton-Watson process with immigration Xn, n ∈ N intro-duced in 1.9.8, defined by X0 and

Xn+1 =Xn

∑i=1

ξ(n+1)n,i +Yn+1 .

Set m = E[ξ(1)1

]. If m > 1, then the corresponding Galton-Watson process with-

out immigration diverges to infinity, so immigration will only make the divergeencefaster. If m < 1, the Galton Watson process becomes reaches 0 (the population be-comes extinct) in a finite umber of generations. With immigration, a steady stateis possible. Let P be the Markov kernel of the INAR process and let V be identityfunction on N, i.e. V (x) = x for all x ∈ N. Then the kernel P satisfies a geometricdrift condition with Lyapounov function V . Indeed,

PV (x) = mx+E [Y1] = mV (x)+E [Y1] .

Fix η ∈ (0,1) and let k0 be the smallest integer such that k0 > E [Y1]/η (assumingimplicitely the latter expectation to be finite). Define C = 0, . . . ,k0 and b = E [Y1].These choices yield

Page 142: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.8 Examples 189

PV ≤ (m+η)V +b1C .

Let ν denote the distribution of Y1. Then, for x,y ∈ N, we have

P(x,y) = P

(x

∑i=1

ξ(1)i +Y1 = y

)

≥ P

(x

∑i=1

ξ(1)i +Y1 = y ,ξ (1)

1 = 0, . . . ,ξ (x)1 = 0

)= µ(y) .

Since m < 1 implies that P(ξ (1)1 = 0)> 0, this yields, for x≤ k0,

P(x,y)≥ εµ(y) ,

with ε = P(ξ (1)1 = 0)k0 . Thus C is a (1,ε)-Doeblin set.

Note also that the INAR process is stochastically monotone, since given X0 = x0and x > x0,

X1 =x0

∑j=1

ξ(1)j +Y1 ≤

x

∑j=1

ξ(1)j +Y1 P-a.s.

7.8.2 Functional Autoregressive Model

The first-order functional autoregressive model on Rd is defined iteratively byXk = m(Xk−1)+Zk, where Zk, k ∈N is an i.i.d. sequence of random vectors inde-pendent of X0 and m : Rd →Rd is a locally bounded measurable function satisfying

limsup|x|→∞

|m(x)||x|

< 1 . (7.57)

Assume that the distribution of Z0 has a density q w.r.t. Lebesgue measure on Rd ;assume in addition that q is bounded away from zero on every compact sets and thatE [|Z0|]< ∞. Let K be a compact set. Then, for every x ∈ K,

P(x,A) =∫

Aq(y−m(x))Leb(dy)

≥∫

A∩Kq(y−m(x))Leb(dy)≥ εKLeb|K(A) ,

where εK =min(t,x)∈K×K q(t−m(x)) and Leb|K(A)=Leb(A∩K). Therefore, ∆TV,K ≤(1− εK) for every compact subset K of Rd . Set V (x) = 1+ |x|. Then,

Page 143: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

190 7 V-geometrically ergodic Markov chains

PV (x) = 1+E [|m(x)+Z1|]≤ 1+ |m(x)|+E [|Z1|] . (7.58)

Using (7.57), there exist λ ∈ [0,1) and r ∈R+ such that, for all |x| ≥ r, |m(x)|/|x| ≤λ . For |x| ≥ r, this implies

PV (x)≤ 1+λ |x|+E|Z1|= λV (x)+1−λ +E|Z1| .

Since m is bounded on compact sets, (7.58) implies that PV is also bounded oncompact sets. Thus, setting b = (1−λ +E|Z1|)∨sup|x|≤r PV (x), we obtain PV (x)≤λV (x)+b.

7.8.3 ARCH(1) model

Let Xk = (α0 + α1X2k−1)

1/2Zk, where Zk, k ∈ N is an i.i.d. sequence be anARCH(1) sequence. Assume that α0 > 0, α1 > 0 and that the random variableZ1 has a density g which is bounded away from zero on a neighborhood of 0, i.e.g(z) ≥ gmin1[−a,a](z) for some a > 0. Assume also that there exists s ∈ (0,1] suchthat µ2s = E

[Z2s

0]< ∞. Set V (x) = 1+ x2s. Since (x+ y)s ≤ xs + ys for all x,y > 0,

we have

PV (x) = Ex[V (X1)] = 1+(α0 +α1x2)sµ2s

≤ 1+αs0µ2s +α

s1µ2sx2s ≤ λV (x)+b ,

with λ = αs1µ2s and b = 1−αs

1µ2s +αs0µ2s. Thus, provided that αs

1µ2s < 1, thetransition kernel P satisfies the geometric drift condition D(V,λ ,b). We now showthat any interval [−b,b] with b > 0 is a Doeblin set (see Definition 7.12). For A ∈B(R) and x ∈ [−b,b], we have

P(x,A) =∫

−∞

1A

((α0 +α1x2)1/2z

)g(z)dz

= (α0 +α1x2)−1/2∫

−∞

1A(v)g((α0 +α1x2)−1/2v)dv

≥ (α0 +α1b2)−1/2gmin

∫∞

−∞

1A(v)1[−a,a](α−1/20 v)dv

= 2aα1/20 (α0 +α1b2)−1/2 1

2aα1/20

∫ aα1/20

−aα1/20

1A(v)dv .

If we set ε = 2aα1/20 (α0 +α1b2)−1/2gmin and define the measure ν by

ν(A) =1

2a√

α0Leb(A∩ [−a

√α0,a√

α0]) ,

Page 144: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.8 Examples 191

we obtain that P(x,A) ≥ εν(A) for all x ∈ [−a,a]. Thus any bounded interval andhence on every compact set of R is a Doeblin set. Applying Theorem 7.19, it followsthat there exists a constant C such that, for all f ,

|Pn f (x)−π( f )| ≤CV (x) | f |V .

7.8.4 Stochastically monotone Markov chains

7.8.5 Metropolis-Hastings algorithms

Consider a Metropolis-Hastings Markov kernel P on an topological space X withtarget density, π and proposal density q(x, ·) with respect to a dominating measureλ ∈M+(X ). Assume that (a) π is bounded above on compact sets of X and (b)(x,y) 7→ q(x,y) is bounded from below on compact sets of X×X.

Let C be a non-empty compact set. Denote πmax,C := supx∈C

π(x)< ∞ and qmin,C =

infx,y∈C

q(x, y) > 0. Choose A ⊆C, and for a given x ∈ X denote the set where moves

are potentially rejected by

Rx =

y ∈ A :

π(y)π(x)

q(y,x)q(x,y)

< 1

and set Ax = A\Rx as the region where all moves are accepted. By construction, forx ∈C

P(x, A) ≥∫

Rx

q(x, y)minπ(y)π(x)

q(y,x)q(x,y)

, 1λ (dy)

+∫

Ax

q(x, y)minπ(y)π(x)

q(y,x)q(x,y)

, 1λ (dy)

=∫

Rx

π(y)π(x)

q(y, x)λ (dy)+∫

Ax

q(x, y)λ (dy)

≥ (qmin,C/πmax,C)∫

Rx

π(y)λ (dy)+qmin,C

∫Ax

π(y)/πmax,Cλ (dy)

≥ (qmin,C/πmax,C)∨qmin,Cπ(A) .

This shows that any compact set is small and that π|C the restriction of π to the setC is an irreducibility measure.

The Random Walk Metropolis (RWM) algorithm is an important special caseof ??. Assume that the target distribution has a density π w.r.t. the d-dimensionalLebesgue measure Leb on Rd which is bounded from above on compact sets. De-note by q the density of the proposal distribution (i.e. Q(x,A) =

∫A q(x,y)Leb(dy) =∫

A q(y−x)Leb(dy)). Assume that q is positive, continuous and bounded. Under these

Page 145: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

192 7 V-geometrically ergodic Markov chains

assumptions (x,y) 7→ q(x,y) = q(y−x) is bounded from below on compact sets and?? shows that every compact set is small.

For the RWM, it is possible to state a partial converse result which explainsthe importance of the boundedness condition on the target distribution π . Beforegoing further, it is required to obtain some preliminary estimate on the Metropolis-Hastings transition density.

Suppose that P is a Metropolis-Hastings algorithm on an arbitrary state space Qwith proposal kernel Q. Then for any set A not containing x, the following upper-bound holds for all m:

Pm(x, A)≤m

∑i=0

(mi

)Qi(x, A) (7.59)

This result follows by an easy induction argument. The statement (7.59) is elemen-tary for m = 1. Moreover, suppose the statement is true for m = k−1, k ≥ 1. Then

Pk(x, A) =∫X

Pk−1(x,dy)P(y, A)

≤∫X

(k−1

∑i=0

(k−1

i

)Qi(x, dy)

)(1A(y)+Q(y, A))

=k−1

∑i=0

((k−1

i

)+

(k−1i−1

))Qi(x, A)+Qk(x, A)

=k

∑i=0

(ki

)Qi(x, A)

Here we consider the simple Metropolis-Hastings algorithm on R (see ??). Recallthat a candidate is drawn from the transition density q(x, ·) and is accepted withprobability α(x,y) given by

α(x,y) =π(y)π(x)

q(y,x)q(x,y)

∧1 ,

where π is the target density. Thus actual transitions of the Metropolis-Hastingschain takes place according to the transition density

p(x,y) = q(x,y)α(x,y) , (7.60)

for y 6= x, and the probability of remaining at the same point is given by

P(x,x) =∫

q(x,y)1−α(x,y)dy . (7.61)

This choice of α ensures that the Markov kernel P is reversible with respect to π

and thus π is the stationary distribution.

Page 146: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.8 Examples 193

Assume that the densities π and q are continuous and positive. Then every com-pact set C⊂R such that Leb(C)> 0 is a Doeblin set for P. Indeed, by positivity andcontinuity, we have supx∈C π(x) < ∞ and infx,y∈C q(x,y) > 0. For fixed x ∈ C andA⊂C, denote

Rx(A) =

y ∈ A :π(y)π(x)

q(y,x)q(x,y)

< 1

and Ax(A) = A\Rx(A). Then,

P(x,A)≥∫

Rx(A)q(x,y)α(x,y)dy+

∫Ax(A)

q(x,y)α(x,y)dy

=∫

Rx(A)

π(y)π(x)

q(y,x)dy+∫

Ax(A)q(x,y)dy

≥ ε

d

∫Rx(A)

π(y)dy+ε

d

∫Ax(A)

π(y)dy = εd−1π(A) ,

with ε = infx,y∈C q(x,y) and d = supx∈C π(x). Hence, for all A ∈B(R) and x ∈C,

P(x,A)≥ P(x,A∩C)≥ επ(C)

dπ(A∩C)

π(C),

which shows that C is 1-small.We will next prove that the kernel P satisfies a geometric drift condition under

restricted assumptions. Assume that q(x,y) = q(y− x), where q is a symmetric,positive and continuous density on R. This implies that the acceptance probabilityis given by α(x,y) = π(y)/π(x)∧ 1 and the procedure is then simply known asthe Metropolis algorithm.

Assume also that π is symmetric and log-concave in the tails, i.e. there existβ > 0 and some x0 > 0 such that,

logπ(x)− logπ(y)≥ β (y− x) , y≥ x≥ x0 ,

logπ(x)− logπ(y)≥ β (x− y) , y≤ x≤−x0 .

This implies that if |y| > x, then π(y)/π(x) ≤ e−β (|y|−x)) < 1 and thus α(x,y) < 1.We will prove that the Foster-Lyapunov drift condition is satisfied with V (x) = es|x|

for any s ∈ (0,β ). Indeed, for x > x0, we have

PV (x) =∫|y|>x

π(y)π(x)

V (y)q(x,y)dy+∫|y|≤x

α(x,y)V (y)q(x,y)dy

≤V (x)∫|y|>x

e−(β−s)(|y|−x)q(x,y)dy+V (x)∫|y|≤x

q(x,y)es(|y|−x)dy

=V (x)∫

0e−(β−s)zq(z)dz+V (x)

∫ −2x

−∞

e−(β−s)(z+2x)q(z)dz

+V (x)∫ −x

−2xe−s(z+2x)q(z)dz+V (x)

∫ 0

−xeszq(z)dz =V (x)g(x) ,

Page 147: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

194 7 V-geometrically ergodic Markov chains

with g(x)≤ 1 for all x and

limx→∞

g(x) =∫

0e−(β−s)zq(z)dz+

∫∞

0e−szq(z)dz < 1 .

Thus, for large enough x1 and x ≥ x1, it holds that PV (x) ≤ λV (x) for some λ ∈(0,1), which depends on β and s. Note now that for any x ∈ R, we have

PV (x) =∫

q(x,y)α(x,y)V (y)dy+V (x)∫1−α(x,y)q(x,y)dy

≤∫

q(x,y)V (y)dy+V (x)≤V (x)∫

q(y)V (y)dy+1

.

Thus, if we assume that∫

q(y)V (y)dy < ∞, we obtain that the kernel P satisfies theFoster-Lyapunov drift condition. Theorem 7.19 shows that the Metropolis algorithmis V -geometrically ergodic.

7.8.6 Geometric convergence of functional autoregressive models

In this section, we consider the general functional autoregressive model

Xk = F(Xk−1, . . . ,Xk−p;Zk) (7.62)

Consider the following assumptions

(NL1) X and Z are open subsets of Rm and F : Xp×Z→ X is a measurable function,(NL2) Zk, k ∈ N is an i.i.d. sequence of Z-valued random variables, independent of

the initial state X0 = (X0,X−1, . . . ,X−p+1)(NL3) The distribution of Z0 has a nontrivial Lebesgue component with density q.(NL4) For any x := (x−1, . . . ,x−p) ∈ Xp, F(x; ·) : Z→ X is a diffeomorphism. The

Jacobian of its inverse F−1(x; ·) is denoted by JF−1(x; ·).

We denote by Xk := (Xk, . . . ,Xk−p+1) the lag vector. Xk, k ∈N is a Markov chainwith Markov kernel P on Xp. For x = (x−1, . . . ,x−p) ∈ Xp and A ∈X ⊗p, we set

R(x,A) =∫1A(F(x;z), x−1, . . . ,x−p+1)q(z)Leb(dz) . (7.63)

Clearly, under (NL3), for all x ∈ Xp and A ∈X ⊗p, P(x,A) ≥ R(x,A) and for anyinteger k, Pk(x,A)≥ Rk(x,A). Consider the function G= (G0, . . . ,Gp−1) :X×Zp→X, defined recursively as follows

Page 148: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.8 Examples 195

G0(x,z) := F(x−1, . . . ,x−p;z0)

G1(x,z) := F(G0(x,z), x−1, . . . ,x−p+1;z1)

...Gp−1(x,z) := F(Gp−2(x,z), . . . , G1(x,z), x−1;zp−1)

For any given x = (x−1, . . . ,x−p), the function zzz = (z0, . . . ,zp−1) 7→ G(x,zzz) is in-vertible. Denote by G−1(x, ·) = [G−1

0 (x, ·), . . . ,G−1p−1(x, ·)] its inverse. If G(x;zzz) =

G(x−1,x−2, . . . ,x−p ;z0,z1, . . . ,zp) = x′ = (x′0, . . . ,x′p−1) ∈ Xp, this inverse is given

by

z0 := F−1(x−1, . . . ,x−p;x′0) = G−10 (x,x′) ,

z1 := F−1(x′0, x−1, . . . ,x−p+1;x′1) = G−11 (x,x′) ,

...

zp−1 := F−1(x′p−2, . . . ,x′0, x−1;x′p−1) = G−1

p−1(x,x′) .

The Jacobian matrix of G−1(x, ·)

JG−1(x,·)(x′) =

∂G−1

0 (x,x′)∂x′0

. . .∂G−1

0 (x,x′)∂x′p−1

.... . .

...∂G−1

p−1(x,x′)

∂x′0. . .

∂G−1p−1(x,x

′)

∂x′p−1

(7.64)

is block-triangular since ∂G−1i (x, ·)/∂x′j = 0 for 0 ≤ i < j ≤ p− 1. The Jacobian

determinant |JG−1(x,·)| is therefore the product of the block-diagonal Jacobians, i.e.

|JG−1(x,·)(x′)|=

∣∣∣JF−1(x−1,...,x−p,·)(x′0)∣∣∣ . . . ∣∣∣JF−1(x′p−2,...,x

′1,x−1,·)(x

′p−1)

∣∣∣ ,where JF−1(x−1,··· ,x−p,·) is the Jacobian matrix of the function F−1(x−1, · · · ,x−p, ·).Denoting q(zzz) = q(z0, . . . ,zp−1) = q(z0) . . .q(zp−1), we get for A ∈X ⊗p,

Rp(x,A) =∫1A(G(x,zzz))q(zzz)Leb(dzzz) =

∫A

gp(x,x′)Leb(dx′) , (7.65)

where

gp(x,x′) :=p−1

∏k=0

qF−1(x′k−1, . . . ,x′0, x−1, . . . ,x−p+k;x′k)

×p−1

∏k=0

JF−1(x′k−1,...,x′0, x−1,...,x−p+k,·)(x

′k) (7.66)

Page 149: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

196 7 V-geometrically ergodic Markov chains

Theorem 7.36. Assume (NL1)–(NL4) and that

(x−1, . . . ,x−p,x′0) 7→ qF−1(x−1, . . . ,x−p;x′0)

,(x−1, . . . ,x−p,x′0) 7→ |JF−1(x−1,...,x−p,·)(x′0)|

are bounded away from 0 on the compact sets of Xp+1. Then, any compact set Csuch that Leb(C) > 0 is (p,ε)-Doeblin set with LebC(·) = [Leb(C)]−1Leb(C ∩ ·)and ε > 0.

Proof. Let C be a compact set. It follows from (7.66) that, since qF−1 and |JF−1 |from zero, that inf(x,x′)∈C×C gp(x,x′)≥ c > 0. Therefore, for any x ∈C and A ∈X ,we get

Rp(x,A) =∫

Agp(x,x′)Leb(dx′)≥

∫A∩C

gp(x,x′)Leb(dx′)

≥ cLeb(C)Leb(C∩A)/Leb(C) ,

showing that C is a Doeblin set. 2

Example 7.37. Consider the functional autoregressive model

Xk = a(Xk−1, . . . ,Xk−p)+b(Xk−1, . . . ,Xk−p)Zk (7.67)

where Zk, k ∈N is an i.i.d. on Rm whose density q is bounded away from zero onany compact set. Set

F(x;z) = a(x)+b(x)z, , x := (x−1, . . . ,x−p) ∈ Rpm , z ∈ Rm ,

where a : Rpm → R and b : Rpm → R+. Assume that for each x ∈ Rpm,b(x) is an m×m invertible matrix, b−1 and a are bounded on compact sets anddet(b−1) is bounded away from zero on compact sets. Under these assumptions, qF−1(x;x′) = b−1(x)(x′− a(x)), JF−1(x;x′) = det(b−1(x)). JF−1(x) = det(b−1(x)),JF−1 are bounded from zero on compact sets.

This type of models include some ”regular” nonlinear autoregressive modelssuch as the ARCH(p) model (a(x−1, . . . ,x−p) = 0 and b(x−1, . . . ,x−p) = α0 +√

α1x2−1 + . . .αpx2

−p, where αi, i∈ 0, · · · , p are nonnegative) and the smooth tran-sition autoregressive models; see Theorem 1.48

These assumptions cover also models for which the regression function is notregular, like the ` regimes scalar (m = 1) SETAR model (see Theorem 1.47),

Xk =

q1i=1 φ

(1)i Xk−i +σ (1)Zk if Xk−d < r1 ,

∑q2i=1 φ

(2)i Xk−i +σ (2)Zk if r1 ≤ Xk−d ≤ r2 ,

......

∑qki=1 φ

(k)i Xk−i +σ (`)Zk if rk−1 < Xk−d ,

(7.68)

Page 150: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.8 Examples 197

where −∞ = r0 < r1 < r2 < · · ·< rk−1 < rk = ∞ and σ (i) > 0, for i ∈ 1, . . . ,m. Insuch case, the regression functions are piecewise linear

a(x−1,x−2, . . . ,x−p) =k

∑j=1

1R j(x−d)qi

∑i=1

φ( j)i x−i ,

b(x−1,x−2, . . . ,x−p) =k

∑j=1

1R j(x−d)σ j ,

where m = max(q1,q2, . . . ,qk) and R j =[r j−1,r j

), j ∈ 1, . . . ,k.

We now consider the Foster-Lyapunov condition. Checking this condition is markedlymore complicated and it is in general difficult to provide conditions that apply easilyto all models. We will therefore check the Foster-Lyapunov condition on a case-by-case basis.

Example 7.38 (Autoregressive model of order p). Consider an autoregressiveprocess of order p, Xk = ∑

pj=1 φ jXk− j + Zk for t ≥ 0, where Zk, k ∈ N is an

i.i.d. sequence of random variables which is independent of the initial conditionX0 = (X−1, . . . ,X−p). Assume that

1. the distribution of Z0 has a Lebesgue component with density q w.r.t. theLebesgue measure and that q is positive, lower semi-continuous and E|Z0|< ∞.where |·| denotes an arbitrary norm on Rp.

2. the roots of the prediction polynomial φ(z) = 1−∑pj=1 φ jz j are outside the unit-

circle, i.e., φ(z) 6= 0 for |z| ≤ 1.

In this case F(x−1, . . . ,x−p;z) = ∑pj=1 φ jx− j + z and F−1(x−1, . . . ,x−p;y) = y−

∑pj=1 φ jx− j. The function (x−1, . . . ,x−p;y) 7→F−1(x−1, . . . ,x−p;y) is continuous and

JG−1 is the identity matrix. Theorem 7.36 implies that every compact set of Rp is aDoeblin set.

We now establish the Foster-Lyapunov drift condition. Note that

E [ |Xk+1| ‖|Xk = x]5 9Φ 9 |x|+E |Z0| ,

where Φ is the companion matrix (??) and 9 ·9 is the matrix norm associated tothe vector norm |·| and Z0 = [Z0,0, . . . ,0]′. Therefore, (7.9) will be fulfilled withV (x) = 1+ |x| provided that 9Φ9 < 1.

As shown in ??, the Markov chain Xk, k ∈N has a unique invariant stationarydistribution when ρ(Φ) < 1, where ρ(Φ) is the spectral radius of Φ (see ??). Weknow from ?? that this condition is equivalent to assume that φ(z) 6= 0 for |z| ≤ 1.For a given norm, the Foster-Lyapunov drift condition in general fails to establishthis property, since it is easy to find examples, where 9Φ9 > 1 and ρ(Φ)< 1. Onthe other hand, if ρ(Φ)< 1 there exists a matrix norm 9 ·9Φ such that 9Φ9Φ < 1,since there exists a norm such that 9Φ 9Φ −ρ(Φ) is arbitrarily small (see ??).We will now construct such norm. By the Schur triangularization theorem, thereis an unitary matrix U and an upper triangular matrix ∆ , such that Φ = U∆U ′.

Page 151: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

198 7 V-geometrically ergodic Markov chains

The diagonal elements of ∆ are the (possibly complex) eigenvalues of Φ , denotedλ1, . . . ,λp ∈ Cp. For γ > 0, let Dγ = diag(γ,γ2, . . . ,γ p) and compute

Dγ ∆D−1γ =

λ1 γ−1d1,2 γ−2d1,3 . . . γ−p+1d1,p0 λ2 γ−1d2,3 . . . γ−p+2d2,p0 0 λ3 . . . γ−p+2d3,p· · · · · · ·0 0 0 · · · γ−1dp−1,p0 0 0 0 λp

We can choose γ large enough so that the sum of all absolute values of the off-diagonal entries of Dγ ∆D−1

γ are less than ε , where 0 < ε < 1−ρ(Φ). For a p× pmatrix A = (ai, j), denote by 9A91 = max1≤ j≤p ∑

pi=1 |ai, j| the maximum column

sum matrix norm. If we set 9A9Φ = 9DγU ′AUD−1γ 91, then we have constructed

a matrix norm such that 9Φ9Φ < ρ(Φ)+ ε < 1. For x = x−1, . . . ,x−p ∈ Rp, de-note |x|1 = ∑

pj=1 |x− j|. Note that |Ax|1 ≤ 9A91 |x|1. Define finally the vector norm

|x|Φ=∣∣DγU ′x

∣∣1. We get

|Φx|Φ=∣∣DγU ′Φx

∣∣1 =

∣∣DγU ′ΦUD−1γ DγU ′x

∣∣1

=∣∣Dγ ∆D−1

γ DγU ′x∣∣1≤ 9Dγ ∆D−1

γ 91 |x|Φ ≤ (ρ(Φ)+ ε) |x|Φ

.

Taking V (x) = 1+ |x|Φ

, we obtain

PV (x) = 1+Ex(|X1|Φ)≤ 1+ |Φx|Φ+E |Z1|Φ

≤ (ρ(Φ)+ ε)V (x)+1− (ρ(Φ)+ ε)+9DγU ′91 E|Z1| ,

showing that the Foster-Lyapunov condition is satisfied.

Extensions of these types of techniques to more general functional autoregressivemodels should be considered on a case by case basis.

Example 7.39 (Nonlinear autoregressive models). Consider the p-th order non-linear autoregressive process (NLAR(p)),

Xk = a(Xk−1, . . . ,Xk−p)+Zk , (7.69)

where Zk, k ∈ N is an i.i.d. sequence of random variable independent of X0 =[X−1, . . . ,X−p]

′. We assume that Z0 has a density which is bounded away from zeroon compact sets, E|Z0|< ∞, a is bounded on compact sets and

lim|x|→∞

|a(x)−φ ′x||x|

= 0 ,

where |·| is the euclidean norm and φ := [φ1,φ2, . . . ,φp]′ is such that the roots of the

polynomials 1−φ1z−·· ·−φpzp 6= 0 for |z| ≤ 1. In words, this condition implies thatthe model is asymptotically linear and that the limiting linear model is geometrically

Page 152: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

7.8 Examples 199

ergodic. Define as in the previous example V (x) = 1+ |x|Φ

. We get

PV (x)≤ 1+ |A(x)|Φ+E [|Z1|Φ ] = 1+ |Φx|

Φ+ |A(x)−Φx|

Φ+E [|Z1|Φ ](

ρ(Φ)+|a(x)−φ ′x|

Φ

|x|Φ

)|x|

Φ+E [|Z1|Φ ] ,

where A(x−1, . . . ,x−p) := [a(x−1, . . . ,x−p),x−1, . . . ,x−p+1]′ and Z1 = [Z1,0, . . . ,0]′.

Since, limsup|x|→∞ |ρ(Φ)+| |a(x)−φ ′x|Φ|/sup |x|< 1, and for any R> 0, sup|x|≤R PV (x)<

∞ on any compact set, then the Foster-Lyapunov function is satisfied.

Example 7.40 (Nonlinear autoregressive models). Consider again the NLAR model(7.69) and assume that there exist λ < 1 and a constant c such that

|a(x−1, . . . , x−p)| ≤ λ max|x−1|, . . . , |x−p|+ c . (7.70)

We check that, under this condition, the model is geometrically ergodic. Put Xk =(Xk, . . . , Xk−p+1) and define the vector norm |x|0 =

∣∣x−1, . . . ,x−p∣∣0 =max(|x−1|, . . . , |x−p|).

Using (??), the recursion may be rewritten as X0 = x, X1 =Z1+A(X0)=Z1+A(x),X2 = Z2 +A(X1) = Z2 +A(Z1 +A(x)) and, by induction Xp = Zp +A(Xp−1) =Zp +A(Zp−1 + · · ·+A(x)). Using (7.70), we obtain

|x1| ≤ |Z1|+ c+λ max|x0|, . . . ,‖x−p+1| ≤ |Z1|+ c+λ |X |0 ,

|x2| ≤ |Z2|+ c+λ max|x1|, . . . , |x−p+2|≤ |Z2|+ c+λ max|Z1|+ c+λ max|x0|, . . . , |x−p+1|, |x0|, . . . , |x−p+2|≤ |Z2|+ c+λ (|Z1|+ c)+λ maxλ max|x0|, . . . , |x−p+1|, |x0|, . . . , |x−p+2|≤ |Z2|+ c+λ (|Z1|+ c)+λ max|x0|, . . . , |x−p+2|, λ |x−p+1|≤ |Z2|+ c+λ (|Z1|+ c)+λ |X |0 ≤ |Z2|+ c+(|Z1|+ c)+λ |X |0 ,

and similarly,

|xp| ≤ |Zp|+ c+λ (|Zp−1|+ c)+λ2(|Zp−2|+ c)+ · · ·+λ

p−1(|Z1|+ c)

+λ max|x0|, . . . , |x−p+2|, λ |x−p+1|≤ |Zp|+ c+λ (|Zp−1|+ c)+ · · ·+λ

p−1(|Z1|+ c)+λ |X |0≤ |Zp|+ c+(|Zp−1|+ c)+ · · ·+(|Z1|+ c)+λ |X |0 .

From the above inequalities and E|Zk|< ∞ we can easily get

E∣∣Xp∣∣0 = E [max(|Xp|, |Xp−1|, . . . , |X1|)]≤ E|Zp|+ c+(E|Zp−1|+ c)+ · · ·+(E|Z1|+ c)+λ |X |0≤ λ |X |0 + c′,

where c′ is a constant number. Thus by taking the test function V to be the normV (x) = 1+ |x|0, the Foster-Lyapunov condition is satisfied.

Page 153: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Appendices

Page 154: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x
Page 155: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

References

Athreya and Ney, 1978. Athreya, K. B. and Ney, P. (1978). A new approach to the limit theory ofrecurrent Markov chains. Trans. Am. Math. Soc., 245:493–501.

Billingsley, 1999. Billingsley, P. (1999). Convergence of probabilities. Wiley Series in Probabilityand Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition. AWiley-Interscience Publication.

Burkholder, 1973. Burkholder, D. L. (1973). Distribution function inequalities for martingales.Ann. Probability, 1:19–42.

Burkholder et al., 1972. Burkholder, D. L., Davis, B. J., and Gundy, R. F. (1972). Integral in-equalities for convex functions of operators on martingales. In Proceedings of the Sixth Berke-ley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif.,1970/1971), Vol. II: Probability theory, pages 223–240, Berkeley, Calif. Univ. California Press.

Davis, 1970. Davis, B. (1970). On the integrability of the martingale square function. Israel J.Math., 8:187–190.

Dawid, 1980. Dawid, A. P. (1980). Conditional independence for statistical operations. Ann.Statist., 8(3):598–617.

Garcıa-Palomares and Gine-Masdeu, 1977. Garcıa-Palomares, U. and Gine-Masdeu, E. (1977).On the linear programming approach to the optimality property of Prokhorov’s distance. J.Math. Anal. Appl., 60(3):596–600.

Hitczenko, 1990. Hitczenko, P. (1990). Best constants in martingale version of Rosenthal’s in-equality. Ann. Probab., 18(4):1656–1668.

Kallenberg, 2002. Kallenberg, O. (2002). Foundations of modern probability, 2nd Edition.Springer-Verlag, New York.

Meyn and Tweedie, 1993. Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and StochasticStability. Springer, London.

Mira and Tierney, 1997. Mira, A. and Tierney, L. (1997). On the use of auxiliary variables inmarkov chain monte carlo sampling. Scandinavian Journal of Statistics.

Nummelin, 1978. Nummelin, E. (1978). A splitting technique for Harris recurrent Markov chains.Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 4:309–318.

Roberts and Rosenthal, 1999. Roberts, G. O. and Rosenthal, J. S. (1999). Convergence of slicesampler markov chains. J. Roy. Statist. Soc. B, 61(3):643–660.

Rudin, 1987. Rudin, W. (1987). Real and complex analysis. McGraw-Hill Book Co., New York,third edition.

Rudin, 1991. Rudin, W. (1991). Functional analysis. International Series in Pure and AppliedMathematics. McGraw-Hill, Inc., New York, second edition.

Tanner and Wong, 1987. Tanner, M. and Wong, W. (1987). The calculation of posterior distribu-tions by data augmentation. J. Am. Statist. Assoc., 82:528–550.

Page 156: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x
Page 157: 0.0.1 Recommandations typographiques · 1 Todo Lists 0.0.1 Recommandations typographiques 1.Pour les notations, si on veut ecrire un ensemble, c est bon d utiliser la com-mande \ensemble{x

Index

V -Dobrushin coefficient, 169P∗, 12

absorbing set, 14

Coupling, 151

Dobrushin coefficient, 154Doeblin condition, 156Doeblin set, 156Drift conditions

Geometric, 170Dynamical System, 129

ErgodicityErgodicity, 132

Hitting time, 45

induced chain, 51Invariant-Event, 130

Log-linear Poisson autoregression, 35

m-skeleton, 8

Occupation timeof set, 52

resolvent, 8Resolvent kernel, see TransitionReturn time, 45

setabsorbing, 14

Stopping times, 45

total variation distance, 148total variation norm, 148Transition

kernelresolvent, 8Sampled Chain, 8

425