Rejoinder

9
This article was downloaded by: [Queen Mary, University of London] On: 06 October 2014, At: 18:31 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of the American Statistical Association Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uasa20 Rejoinder Nils Lid Hjort & Gerda Claeskens Published online: 31 Dec 2011. To cite this article: Nils Lid Hjort & Gerda Claeskens (2003) Rejoinder, Journal of the American Statistical Association, 98:464, 938-945, DOI: 10.1198/016214503000000882 To link to this article: http://dx.doi.org/10.1198/016214503000000882 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Transcript of Rejoinder

Page 1: Rejoinder

This article was downloaded by: [Queen Mary, University of London]On: 06 October 2014, At: 18:31Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical AssociationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uasa20

RejoinderNils Lid Hjort & Gerda ClaeskensPublished online: 31 Dec 2011.

To cite this article: Nils Lid Hjort & Gerda Claeskens (2003) Rejoinder, Journal of the American StatisticalAssociation, 98:464, 938-945, DOI: 10.1198/016214503000000882

To link to this article: http://dx.doi.org/10.1198/016214503000000882

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensorsmake no representations or warranties whatsoever as to the accuracy, completeness, or suitabilityfor any purpose of the Content. Any opinions and views expressed in this publication are the opinionsand views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy ofthe Content should not be relied upon and should be independently verified with primary sources ofinformation. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands,costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial orsystematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distributionin any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Page 2: Rejoinder

938 Journal of the American Statistical Association, December 2003

(1999). “Bayes Factors and BIC,” Sociological Methods and Research,27, 411–427.

Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997), “Model Selection andAccounting for Model Uncertainty in Linear Regression Models,” Journal ofthe American Statistical Association, 92, 179–191.

Raftery, A. E., Madigan, D., and Volinsky, C. T. (1995), “Accounting for ModelUncertainty in Survival Analysis Improves Predictive Performance” (withdiscussion), in Bayesian Statistics, Vol. 5, eds. J. M. Bernardo et al., Oxford,U.K.: Oxford University Press, pp. 323–349.

Rubin, D. B., and Schenker, N. (1986), “Ef� ciently Simulating the CoverageProperties of Interval Estimates,” Applied Statistics, 35, 159–167.

Smith, A. F. M., and Spiegelhalter, D. J. (1980), “Bayes Factors and ChoiceCriteria for Linear Models,” Journal of the Royal Statistical Society, Ser. B,42, 213–220.

Viallefont, V., Raftery, A. E., and Richardson, S. (2001), “Variable Selectionand Bayesian Model Averaging in Case–Control Studies,” Statistics in Medi-cine, 20, 3215–3230.

Volinsky, C. T. (1997), “Bayesian Model Averaging for Censored SurvivalModels,” Ph.D. thesis, University of Washington, Seattle.

Weakliem, D. L. (1999), “A Critique of the Bayesian Information Criterion,”Sociological Methods and Research, 27, 359–397.

RejoinderNils Lid HJORT and Gerda CLAESKENS

We are honored to have our work read and discussed atsuch a thorough level by several experts. Words of apprecia-tion and encouragement are gratefully received, and the manysupplementary comments, thoughtful reminders, new perspec-tives, and additional themes raised are warmly welcomed anddeeply appreciated. Our thanks also go to the editor, FranciscoSamaniego, and his editorial helpers for organizing this discus-sion.

Space does not allow us to answer all of the many worthwhilepoints raised by our discussants, but in the following we makean attempt to respond to what we perceive as being the majorissues. Our responses are organized by themes rather than bydiscussants. We shall refer to our two articles as “the FMA ar-ticle” and “the FIC article.”

1. THE LOCAL NEIGHBORHOOD FRAMEWORK

In our articles we chose to work inside a broad and gen-eral parametric framework, which, in the regression case, cor-responds to our using, say,

fi;true.y/ D f .yjxi; ¾0;¯0; °0 C ±=p

n /I (1)

see Section 2 in the FMA article and Section 2 in the FIC arti-cle. This draws partial criticism from Raftery and Zheng, whoquestion its realism, as well as from Ishwaran and Rao, whoargue that it does not yield a good framework for subset regres-sion problems.

The local neighborhoodframework (1) allows one to extendfamiliar standard iid and regression models (corresponding tohaving ± � xed at 0) in several parametric directions (corre-sponding to ±1; : : : ; ±q allowed to be nonzero, for different en-visaged departures from the start model), as exempli� ed in ourarticles. This may, in particular, be utilized for robustness pur-poses and sensitivity analyses and leads to a fruitful theory formodel averaging and focused model selection criteria, as wehave demonstrated.

In their Section 4 Raftery and Zheng mention two pro-(1)arguments, before presenting their reservations. The main ar-gument for working inside (1) is, however, that it leads tonatural, general, and precise limit distribution results, with con-sequent approximations for mean squared errors and the like;the key is that variances and squared modeling biases becomeexchangeable currencies, both of size 1=n. For classes of esti-mators b¹ of ¹.µ;° /, including the submodel estimators b¹S D

¹.bµS;b°S ; °0;Sc/, we have

Eµ;°

©b¹ ¡ ¹.µ;° /

ª2 D n¡1½1¡µ;

pn.° ¡ °0/

¢

C n¡3=2½2¡µ;

pn.° ¡ °0/

¢C ¢ ¢ ¢; (2)

for example, under regularity conditions. Such expansions,written out here without the ± that Raftery and Zheng appearto dislike, would typically be valid uniformly over k° ¡ °0k ·const:=

pn balls. We view (2)-type results as a good reason for

developing and presenting theory in terms of ± Dp

n.° ¡ °0/,that is, using (1). Our articles have (in particular) provided for-mulas for ½1.µ; ±/ here, the limiting risk for b¹, whereas expres-sions for ½2.µ; ±/ are harder to get hold of; see our responseto Tsai’s comments in Section 6. We have also noted, in FIC’sSection 5.5, that approximationscoming from using the leadingterm in (2)-type expansions hold with exactness for � nite n forsubmodel estimators of means in linear regression.

Thus, Raftery and Zheng interpret us a little bit too literallyat the end of their Section 4; as statisticians we do not believethat our model parameter ° changes value when our datasetpasses from n D 100 to n D 101, but we do believe that limittheorems based on the (1) framework provide a lucid under-standing and useful approximations for the given n. This com-ment also applies to our BMA investigations (FMA’s Sec. 9),where priors and posteriors for .µ; ° / are transformed to priorsand posteriors for .µ; ±/. (A too-literal belief in sample-size-dependent parameters would clash with Kolmogorov consis-tency and other requirements for natural statistical models; seeMcCullagh 2002 and the ensuing discussion.)

A further strand of arguments supporting the view that manyquestions � nd their most natural solutions inside the ° ¡ °0 DO.1=

pn / framework is related to what we termed “tolerance

radii” in FMA’s Section 10.5. How much quadraticity, or vari-ance heteroscedasticity, can the normal regression model tol-erate in the sense that the simpler methods based on standardassumptions still give better results than the more cumbersomeones based on the larger models? How much autocorrelationcan typical iid-based methods take? Such questions are nicelyanswered using the sample-size dependent magnifying glass

© 2003 American Statistical AssociationJournal of the American Statistical Association

December 2003, Vol. 98, No. 464, Theory and MethodsDOI 10.1198/016214503000000882

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 3: Rejoinder

Hjort and Claeskens: Rejoinder 939

± Dp

n.° ¡ °0/, as touched on in Section 10.5 of the FMAarticle. Consider, for example, the .¯0; ¾;¯1; ¸/ model of FIC’sSection 4.1. The simple iid model Yi » N.¯0; ¾ 2/ can toleratethe presence of a regression coef� cient ¯1 and a skewness pa-rameter ¸ as long as

pnj!1¯1 C !2.¸ ¡ 1/j · .kn;1!2

1 C kn;2!22/1=2:

This � rst-order asymptotic answer rests on a frameworklike (1) and depends also on the focus parameter under studyvia .!1;!2/; see FIC’s Section 4.1 for examples. Inside the el-lipse kn;1¯2

1 C kn;2.¸ ¡ 1/2 · 1=n, all estimands will be betterestimated using the simple N.¯0; ¾ 2/ model rather than usingthe the formally correct four-parameter model.

Another bene� t of our methodology,and the (1)-type frame-work, is the ability to compare model selection and model av-erage strategies in a uni� ed way, across situations, so to speak.There is a well-de� ned limit experiment, characterized by de-terministic quantities ¿0 and K ; the vector !, which dependson the focus parameter; and an unknown ± for which one ob-serves D » Nq.±; K/. Inference is then sought for à D !t±.Thus, lessons learned, for example, for Poisson regression mod-els can be carried over to, for example, logistic regressions.Thisis also in the Le Cam spirit of asymptotic equivalenceof statis-tical experiments; see, for example, van der Vaart (1998) andBrown (2000) for a general discussion.

Note that ± can be large in size in (1) and (2), so reading ourarticles as saying that we only care about ° being close to °0 isnot correct. Also, when ° happens to be far away from °0 , thiswill be picked up by the data viab±full D

pn.b°full ¡°0/, and most

sensible model averaging schemes, includingFIC and weightedFIC, will give weights close to 1 for the wide model. As, forexample, Johnson hints at, other techniques might be neededto better assess the behavior and properties of model averagemethods in clearly noncontiguoussituations, that is, when ° isfar from °0.

We have seen that the O.1=p

n / framework is canonical in-side general parametric models with independence.This has todo with information increasing linearly with sample size di-vided by model complexity and variances being proportionalto inverse information. Ishwaran and Rao mention Breiman’sbagging, which indeed may be viewed in terms of model aver-aging. Some of the calculations in Section 2 of Bühlmann andYu (2002) may be seen as special cases of our general FMAtheory. They show that when averaging takes place over a largenumber of stumps, then (su)bagging is best analyzed inside anO.1=n1=3/ framework. A similar comment applies to some ofthe goodness-of-� t tests of Claeskens and Hjort (2003), wherelarge classes of alternative models are being searched through.

2. TWO USES OF REGRESSION MODELS

Regression analyses have different aims on different occa-sions, and even the same dataset may be analyzed with differentgoals in mind. We have primarily taken the view that what mat-ters most is the quality of the predictors and the precision of thefocus estimators. Ishwaran and Rao, in contrast, equate the sub-set selection regression problem with � nding the exact subset ofnonzero elements among a vector of coef� cients .¯1; : : : ; ¯K /t.As a result, they partly criticize our methods for not being opti-mal for a task they were not set out to perform.

For many applications one would not care much if say ¯7 D:01 rather than being exactly 0, and the additional estimationnoise caused by including b7 in the predictor formulas mightworsen rather than enhance the precision. Ishwaran and Raoappear to say that it is the duty of any subset selection methodto strive for inclusion of such a small ¯7.

There are, of course, situations where detecting the nonze-roness of certain parameters is the main goal of an analysis.This could be a ¯j coef� cient in a linear regression setting, forexample. Our theory works for such a focus parameter, too, be-cause it may be expressed as ¯j D E.Y jx C ej ; u/ ¡ E.Y jx; u/,with ej the j th unit vector. In an effort toward making theworld slightly less unfair, Hjort (1994a) collected and analyzeddata from world championships in sprint speedskating, focus-ing attentionon the average difference d between 500-m resultsreached using the last inner track versus using the last outertrack. The analysis essentially employed a bivariate mixed-effects regression model with seven parameters per champi-onship (and it was necessary to � t the full seven-parametermodel to make inferences about d). Only one parameter mat-tered to the delegates from 37 nations at the 1994 general as-sembly of the International Skating Union. They had to assessthe potential nonzeronessof d and its implications, and actuallyneeded to vote for or against the signi� cance of the point esti-mate (which was bd D :06). The Olympic rules for sprint speed-skating were, in fact, changedas a consequenceof the statisticalanalysis; from Nagano 1998 onward, the athletes are forced toskate the 500-m distance twice. See also Hjort and Rosa (1999).This is an example of a sharply de� ned focus parameter wheretools of FMA and FIC might be used.

We disagree with the way Ishwaran and Rao interpret thescope of our machinery for subset selection problems in regres-sion at the end of their Section 1. The statistician is at the outsetrequired to classify some parameters (say ¯j ’s) as “protected”and other parameters (say °j ’s) as “uncertain”; our methods arethen geared toward � nding the best subsets of °j ’s to include,or to be averaged over. Our methodsare certainly not “restrictedto coef� cients known to be 0,” as Ishwaran and Rao charge intheir point (a). First, the methods are well de� ned and can beapplied regardless of the sizes of the °j ’s. This is also a reply toa comment by Raftery and Zheng, namely, that our (1) is “re-quired by FMA”; FMA methods give algorithms that may beput to work regardless of (1)-type assumptions. Second, eventhough the mathematical results we have provided about vari-ous methods have utilized the °j D °0;j C ±j =

pn framework,

the ±j ’s may be large in size, as also commented on previously.Third, most sensible selection methods or averaging methodswill pick out the widest model in cases where the °j ’s reallyare far from 0.

Regarding their point (b) (end of Sec. 1), it is fair to say thatstatistical modeling is and remains an art demanding skill andexperience for its perfect execution, even with the advent of ad-ditional tools for automatization and diagnostics. The previousargument indicates that it may be rather harmless if a statisti-cian labels a parameter a “° ” when it should rather have been a“¯” (provided the selection or averaging scheme is among thedecently robust ones, with low max risk; see FMA’s Sec. 7); thisalso serves to counter their point (b). A similar comment appliesto Raftery and Zheng’s reservations (Sec. 4), having to do with

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 4: Rejoinder

940 Journal of the American Statistical Association, December 2003

situations where “the coef� cients for some nuisance variablesare substantial, and those for others are small.” In such casesthe crafty modeler should take this into account, redesigningnuisance variable coef� cients as protected.

In their Sections 1 and 2, Ishwaran and Rao argue that in mostregression setups the °0 associated with uncertain (or nonpro-tected) variables must be 0. This is � ne, is not surprising, anddoes not contradict our machinery or methodology.Our theorydoes allow °0 6D 0, too, but this would here correspond to knowntrends, which may be removed from the regression equation.We note that the FIC article has several examples where thecanonical °0 is nonzero.

3. ESTIMATING MODEL ORDER

In some settings there is a natural order of complexityamongcandidate models, as with, for example, polynomial regres-sion. Ishwaran and Rao (Sec. 3) study the problem of esti-mating the actual underlying order of the true model, thatis, the unknown number k0 where the coef� cient vector is.¯1; : : : ; ¯k0; 0; : : : ;0/ and the � rst k0 are strictly nonzero. Asmentioned previously, in many situations estimating k0 mightnot be a vital issue. Their Theorem 1 contrasts backward andforward selection schemes under some conditions and is of in-terest. We believe their theorem should and can be extended tomore general settings, however.

We do think the assumption about the � niteness of fourthmoments may be softened, although this is not crucial. The pri-mary problem we see is their assumption that 6n D n¡1XtX Dn¡1 Pn

iD1 xixti must equal the identity matrix I ; this appears too

restrictive. One may transform a regression model to achievesuch orthogonality, but this would typically in� ict a differentordering of new coef� cients, losing the original motivation ofnestedness. This makes it dif� cult to keep track of the origi-nal k0. To avoid the problem one may keep the original modeland, consequently,keep the k0 as de� ned by the untransformed¯ vector, but accept the weaker assumption that 6n tends to ageneral positive-de� nite Q.

Under weak Lindeberg-type conditions, see, for exam-ple, Hjort and Pollard (1994) (in particular, it does not ap-pear necessary to assume � nite fourth moments), we thenhave

pn.b¡ ¯/ !d NK .0; ¾ 2Q/, with consequent

pn.bk ¡

¯k/ !d N.0; ¾ 2k /, say, where ¾k D ¾ .Qk;k/1=2. Thus, there

is simultaneous convergence Zk;n Dp

nbk=b¾k !d Zk , say,for k ¸ k0 C 1, where these are standard normals with cor-relations inherited from Q. Also, jZk;nj goes to 1 in proba-bility for k · k0. De� ning the backward and forward modelorder estimates as in Ishwaran and Rao, one may now showthat bkB !d kB and bkF !d kF , where

PrfkB D kg D

8>>><

>>>:

0 for k · k0 ¡ 1;

Pr©Zk0C1 2 Jk0C1; : : : ; ZK 2 JK

ª

for k D k0;

PrfZk =2 Jk; ZkC1 2 JkC1; : : : ; ZK 2 JK gfor k ¸ k0 C 1

and

PrfkF D kg D

8>>><

>>>:

0 for k · k0 ¡ 1;

Pr©Zk0C1 2 Jk0C1

ª

for k D k0;

Pr©ZkC1 2 JkC1; Zk0C1 =2 Jk0C1; : : : ;Zk =2 Jk

ª

for k ¸ k0 C 1:

Here Jk D .¡z®k =2; z®k =2/ is the acceptance interval for Zk;n,with limit probability 1 ¡ ®k . Ishwaran and Rao’s Theorem 1corresponds to the case of a diagonal Q matrix, where the Zk’sbecome independent.

In practice, the jZk;nj’s for k · k0 have not quite had time togo to 1, for � nite n, as Zk;n has mean value about

pn¯k=¾k .

The approximations afforded by the preceding limit theoremare easily too crude, particularly when the ¯k’s are small. It isagain natural to use the local neighborhood parameterization,with, say, ¯k D ±k=

pn. The limit distributions for bkB and bkF

may be derived. There will, in particular, be positive probabili-ties for values k · k0 ¡ 1. One � nds, in fact,

PrfkB D kg D Pr

»±k

¾kC Zk =2 Jk;

±kC1

¾kC1C ZkC1 2 JkC1; : : : ;

±K

¾KC ZK 2 JK

¼

for k D 1; : : : ; K , where ±k D 0 for k ¸ k0 C 1 and whereZ1; : : : ;ZK are standard normals with correlations comingfrom Q. There is a corresponding result for kF . This createsa different picture than that of Ishwaran and Rao’s Figure 1,which has been produced under conditions corresponding tohaving j±k j of in� nite size for k · k0 (and Q diagonal).

Ishwaran and Rao comment that the model order parameterk0 is not a smooth function of ¯ and, as such, falls outside thestandard regularity conditions used in our FMA and FIC arti-cles. The resulting predictors, say b¹B D x tb

B and b¹F D x tbF

for a given covariate position x , are, however, amenable to ourmethods, viewed as estimators of ¹ D x t¯ . The backward andforward predictors are model average methods and can be an-alyzed using the FMA methodology. Limit distributions arenonlinear mixtures of biased normals, and their performancemay, in particular, be compared to that of AIC and FIC, as perSection 7 in the FMA article. We also note that the argumentsand results alluded to here should generalize without seriousdif� culties to, for example, generalized linear models.

Ishwaran and Rao “have always wondered about” whether itis better to use forward or backward stepwise regression. Theymight perhaps be encouraged to continue their fruitful won-dering. Even in cases when bkF is more successful than bkB asan estimator of k0 (where, as we argued previously, the analy-sis and conclusion are less clearcut than they appear to be intheir discussion), performance of the backward search wouldstill be better than performance of the forward search for pre-dicting x t¯ in signi� cant portions of the parameter space.

We use this opportunity to nod in agreement to commentsmade by Shen and Dougherty (Sec. 3), namely, that it is veryuseful when the list of candidate models can be restricted a pri-ori, for both FIC and FMA. In situations with a nested se-quence of models, as before, this means reducing the numberof candidates from 2q to q C 1. On the other hand, the listshould be broad enough to re� ect real modeling information,as viewed in conjunction with focus parameters. One possibil-ity for shortening the queue of suitors is via suitable threshold-ing and reweighting, for example, including only the 10 mostpromising models as monitored by the FIC scores, or by theposterior probabilities inside a BMA setup. Our FMA theoryalso continues to be applicable for such strategies.

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 5: Rejoinder

Hjort and Claeskens: Rejoinder 941

4. FMA VERSUS BMA

Raftery and Zheng appear as BMA’s witnesses and delivera strong case. In their earnest zeal they perhaps inadvertentlyrisk classifying or portraying our FMA work as being anti-Bayesian, in spirit, intent, or result. That would be a case ofincorrect classi� cation. Our FMA bag comprises not only thecompromise estimators of FMA’s Section 4, but also averagesof the generalized ridge estimators developed in Section 8, andthese again are close relatives to the BMA methods, as ex-plained in Section 9. When developingour FMA methodology,our points of motivationindeed included our wish to understandbetter the behavior of the BMA strategies.

Realizing that both “BMA” and “FMA” are big bags ofmethods, then, it is a little over-suggestive when Raftery andZheng say that “BMA [was] generally found to have betterperformance” and that “FMA itself does not appear to yieldoptimal methods.” Some BMA regimes are better than oth-ers, and some FMA schemes have optimality properties. Onemay, for example, work with model selection schemes thatpost-model selection use estimators that are minimax over say±tK¡1± · c-type regions, using for this step methods similarto Blaker’s (2000). Also, as mentioned previously, some of thegeneralized ridge versions of FMA correspond (to the � rst or-der) to BMA schemes.

Several model average schemes may be added to the anno-tated list given in FMA’s Sections 5 and 7. Ishwaran and Raotook up backward and forward model selection procedures, andas we explained previously these may be analyzed inside theFMA framework; in particular, their behavior may be analyzedusing FMA’s Theorem 4.1. Johnson might consider having hisformer life prolongatedby revisiting his and his colleagues’ ro-bust Bayesian estimation methods, using the FMA apparatus tounderstand performance.

Cook and Li discuss sliced inverse regression and centralsubspace methods. Such methods are geared more toward di-mension reduction than selection of subsets and may be com-pared to principal components regression (see, for example,Mardia, Kent, and Bibby 1979, chap. 8) and to partial leastsquares regression (see, for example, Helland 1990). With somework we believeversions of these dimension reduction methodsmay be characterized and analyzed as FMA methods. In thesesituations it would be more natural to compare performance interms of suitably averaged predictionaccuracy; see the next sec-tion.

As far as performance is concerned, Raftery and Zheng areperhaps right to arrest us for not paying enough attention to theexisting BMA literature. They provide references to and give asummary of three main strands of results: general Bayes the-ory (along with studies of robustness to prior speci� cations),simulations, and cross-validation-type predictive performance.See also the concise and useful discussion in Clyde and George(2003). What we intended to point to in our introduction to theFMA paper was the surprising lack in the literature of whatone may think of as “the fourth strand of results,” namely, limitdistribution statements. In mathematical statistics we are notquite satis� ed with simulations and cross-validation and indi-cations of good performance; we need precise limit distributionresults. This is not only dictated by tradition and aesthetics,

but gives practical mathematics, providing good approxima-tions for precision measures as well as a tool for comparingthe performance, say, of different BMA schemes. What is lo-gistic regression without results about the limiting behavior ofthe likelihood methods? What is years of hands-on experiencewith averages without the central limit theorem?

Shen and Dougherty stress, along with Johnson and withRaftery and Zheng, as we have done, the necessity of securinga well-de� ned interpretation of the focus parameters (or vari-ables) across models. In our framework this is taken care of via¹ D ¹.f /, where f belongs to suitable submodels of the widestf .y; µ; ° / model. This requirement, when boomeranged backto BMA’s watchtower, becomes the issue we raise in FMA’sSection 1.1, namely, that BMA typically entails mixing togethercon� icting prior opinions about the focus parameters. Our dis-cussants do not take up this point.

5. AVERAGE QUALITY OF PREDICTORS

We appreciate Raftery and Zheng’s additional comments toand extended analysis of the low-birth-weightdataset. Our ownanalysis of this dataset was primarily intended as an illustrationof the developed methods, as opposed to a full scienti� c reporton low birth weights. This was also why we chose somewhatsimple parameters as foci. Let us write these as p1 D p.z1/,p2 D p.z2/, and ½ D p2=p1, where z1 and z2 are the averagecovariate vectors for white and black mothers, respectively.Wemay agree with Johnson and with Raftery and Zheng that thereare yet other parameters to focus on, with perhaps higher socio-biological relevance; again, our parameters were chosen for il-lustration and simplicity. We still believe that ½ has some merit,though. A litmus test for “being of interest” might be whetherone can imagine a newspaper or magazine publishing a storyabout a � nding concerning the parameter in question; here anews story sentence like “the average black mother has a 50%greater chance than the average white mother of giving birth totoo small children” would appear to pass the test. Of course,one should with such a � nding attempt to investigate further,including aspects of the covariate distributions.

Comments from Cook and Li as well as from Raftery andZheng point to the usefulness of developing the FIC and FMAapparatus to assess prediction quality when averaged in suitableways, rather than for one focus parameter at a time. We touch onthis in the FIC article’s Sections 5.6 and 7.2. Such averaging isparticularly natural in regression models, where focus might beon the behavior of, say b¹.x; u/, for a regression surface ¹.x;u/

for particular subregions of u for � xed x , and so on. We notethat the theory and arguments also invite suitable weighted gen-eralizations of the AIC.

To indicatehow the machinery can be developedfurther, con-sider a linear regression setup with Yi D x t

i¯ C uti° C "i for

i D 1; : : : ; n, where the "i ’s are iid with mean 0 and standard de-viation ¾ and where ° D ±=

pn. The xi ’s are protected,whereas

elements of the ui ’s may or may not be taken into the � nally se-lected model. We study the average weighted prediction error

En D n¡1nX

iD1

.b»i ¡ »i/2w.xi; ui/; (3)

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 6: Rejoinder

942 Journal of the American Statistical Association, December 2003

where »i D E.Y jxi;ui/ D x ti¯ Cut

i° andb»i an estimator thereof,with w.x; u/ a suitable weight function. We shall see that nEn

has a limit distribution under reasonable conditions.Let

6n D n¡1nX

iD1

³xi

ui

´ ³xi

ui

´t

6n;00 6n;016n;10 6n;11

´;

of size .p C q/ £ .p C q/, assumed to be of full rank. Its in-verse 6¡1

n has blocks denoted by 6ijn , and similarly for the

smaller .p C jSj/ £ .p C jSj/ matrix 6n;S with inverse 6¡1n;S .

We assume that 6n ! 6 as n increases, also of full rank. Welet Ln D 611

n , along with Ln;S D .¼SL¡1n ¼ t

S/¡1 and Hn;S DL

¡1=2n ¼ t

SLn;S¼SL¡1=2n . The matrices Ln;Ln;S , and Hn;S have

limits L; LS , and HS .For the S subset estimator

³ bS

b°S

´D 6¡1

n;Sn¡1nX

iD1

³xi

ui;S

´Yi ;

we have

pn

³ bS ¡ ¯

b°S

´d!

³CS

DS

´D 6¡1

S

³601± C M

611± C NS

´;

where .M;N/ » NpCq.0; ¾ 26/ and NS D ¼SN . We may write

En D n¡1nX

iD1

.xtib

S C uti;Sb°S ¡ x t

i¯ ¡ uti° /2w.xi;ui/

D n¡1nX

iD1

"³xi

ui

´tÁ b

S ¡ ¯³b°S ¡ °S

¡°Sc

´!#2

w.xi; ui/

DÁ bS ¡ ¯

b°S ¡ °S

¡°Sc

!t

Än

Á bS ¡ ¯

b°S ¡ °S

¡°Sc

!

;

where Än is the w-weighted version of 6n . Thus, if onlyÄn !p Ä,

nEnd! E D

ÁCS

DS ¡ ±S

¡±Sc

!t

Ä

ÁCS

DS ¡ ±S

¡±Sc

!

:

Expressions for the mean of E may be found using tools of theFIC article.

When w D 1 in (3) we have Än D 6n and a correspondingsimpli� cation for E . The limiting risk using S can be shown tobe

E.E/ D .p C jSj/¾ 2 C ±tL¡1=2.I ¡ HS /L¡1=2±;

using arguments as in FIC’s Section 7.2. Let Dn Dp

nb°full,which goes to an Nq.±; ¾ 2L/. An unbiased risk estimator is

driskS D .p C jSj/b¾ 2

C Tr£L¡1=2.I ¡ HS/L¡1=2.DnDt

n ¡ b¾ 2L/¤

D .p ¡ q C 2jSj/b¾ 2 C DtnL¡1

n Dn

¡ DtnL

¡1=2n Hn;SL

¡1=2n Dn;

where b¾ 2 is the usual unbiased estimator of variance, usingthe full model. This leads to the following selection criterion:Choose the subset with smallest value of

ave-FIC.S/ D b¾ 2©2jSj C nbÁ t.I ¡ Hn;S/bÁ=b¾ 2ª

;

where bÁ D L¡1=2n b°full. This appears to be related to both Mal-

lows’ Cp as well as to Cook and Li’s Equation (2) (worked outthere for the case of p D 1, xi D 1, and

PniD1 ui D 0). Note

thatp

n bÁ=b¾ !d Nq.Á; I /, where Á D L¡1=2±. For other ex-tensions of the Mallows criterion, and theory, see Birgé andMassart (2001).

We note that the preceding ideas and arguments may be usedto � nd precise limit distributions for average prediction errorvariables of the type

PniD1fb¹.xi ;ui/ ¡ ¹.xi;ui /g2w.xi ;ui/ in

quite general regression models and for quite general model av-erage estimators. Such results may, in particular, be used formodel and subset selectionpurposes.One is quite free to chooseweight schemes appropriatefor the purpose. If one wishes to as-sess predictor quality for a � xed x0, when averaged over u, onemay insert w.x0;u/ proportional to an estimate of the condi-tional density of u given x0. This might be a multinormal den-sity or a kernel smooth over a window around x0.

We think that developments as outlined previously mightlead to useful “focused regression diagnostics” of differenttypes. The comments of Cook and Li also point in this direc-tion.

6. SECOND-ORDER CORRECTIONS

In our articles we have determined the limit distribution of3n;S D

pn.b¹S ¡ ¹true/ (as well as for more general estima-

tors, such as the post-model selection estimator). This gives theapproximation

riskn.S; ±/ D nE.b¹S ¡ ¹true/2 :D E32

S ; (4)

where 3S is the limit variable. In FMA’s Section 10.7 and FIC’sSection 7.6, we mentioned the potential for suitable � nite-sample corrections to the � rst-order results of type (4). We areglad that Tsai has taken up this challenge, providing what heterms “improved” and “corrected” versions of the FIC.

The exact bias and variance of b¹S would often depend incomplicatedways on the model and sample size; see, for exam-ple, Dukic and Peña (2003) for � nite-sample analysis of someparticular post-selection-estimators in Gaussian models. Some-times expansions for these might be worked out, however. Sup-pose in general terms that

E3n;S D B1.S; ±/ C B2.S; ±/=p

n C B3.S; ±/=n C o.1=n/;

Var 3n;S D V1.S/ C V2.S; ±/=n C o.1=n/

for suitable coef� cients. Lemma 3.3 in the FMA article, in fact,gives expressions for the leading terms B1.S; ±/ and V1.S/ and,hence, for the leading term E32

S D B1.S; ±/2 C V1.S/ in (4). Itthen follows that

riskn.S; ±/ D B1.S; ±/2 C V1.S/ C 2B1.S; ±/B2.S; ±/=p

n

C fB2.S; ±/2 C 2B1.S; ±/B3.S; ±/

C V2.S; ±/g=n C o.1=n/:

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 7: Rejoinder

Hjort and Claeskens: Rejoinder 943

This shows that the second-order term to catch (and estimate) is2B1.S; ±/B2.S; ±/=

pn. This necessitates � nding an expression

for B2.S; ±/.FollowingTsai, this requires taking the delta methodone step

further, using a second-order Taylor expansion. We do this in asomewhat different way. Starting with

b¹S ¡ ¹true D ¹.bÁS; °0;Sc/ ¡ ¹.Á0;S; °0;Sc/

C ¹.µ0; °0/ ¡ ¹.µ0; °0 C ±=p

n /;

we may split 3n;S into two parts, with leading terms

.@¹=@Á/tpn.bÁS ¡ Á0;S/ C 12

pn.bÁS ¡ Á0;S/t¹11;S.bÁS ¡ Á0;S/

and

¡.@¹=@° /t± ¡ 12 ±t¹22±=

pn:

In our notation ¹11;S is the .p C jSj/ £ .p C jSj/ matrixof second-order derivatives of ¹.µ;°S; °0;Sc/ w.r.t. .µS; °S/,whereas ¹22 is the q £ q matrix of second-order derivativesof ¹.µ; ° / w.r.t. ° . These derivatives are evaluated under thenarrow model .µ0; °0/.

To go further, we need

Ep

n.bÁS ¡ Á0;S/

D J ¡1S

³J01

¼SJ11

´± C mS .±/=

pn C nS.±/=n C ¢ ¢ ¢;

with suitable (but often cumbersome) expressions for mS.±/

and nS.±/ obtainable from work touched on by Tsai; see alsoBarndorff-Nielsen and Cox (1994, chaps. 5 and 6). This leads to

B2.S; ±/ D .@¹=@ÁS/tmS.±/ C 12 Tr.¹11;SJ ¡1

S / ¡ 12 ±t¹22±:

To summarize,

riskn.S; ±/ D E32S C 2B1.S; ±/B2.S; ±/=

pn C o.1=

pn/ (5)

provides a second-order corrected version of (4).The previous treatment is related to but not fully equivalentto

what Tsai does. He studies nonlinearityaspects in his Section 2and bias of likelihood estimators in Section 3. It appears to us,from the preceding arguments, that it is necessary to combineboth these second-order aspects. If not one risks catching oneor two of the terms making up B2.S; ±/, but not all three, and apartial reparation might be worse than no reparation.

We would perhaps hesitate to af� x the labels “improved” and“corrected” too � rmly to Tsai’s modi� ed FIC’s. It is clear fromthe preceding discussion that there are several possibilities forsuch second-order approximationsto the mean squared error ofestimators. Also, one needs indirectly or directly to estimateB1.S; ±/B2.S; ±/ from data, where there are several paths tofollow, for example, regarding wide versus narrow estimationof partial derivatives. Furthermore, this estimation step mightcause additional variability that might take away the intendedbene� t. Such phenomena are well known in mathematical sta-tistics. A second-order Edgeworth expansion might not be agenuine improvement over a � rst-order Edgeworth expansion,for example, or perhaps there is improvement only for verylarge sample sizes. All this serves to indicate that further stud-ies are required before a general-purposesecond-order FIC canbe established.

We note that Tsai’s work, and presumably also the precedingdevelopment, is relevant also when it comes to assessing thebehavior of model average estimators.

7. ESTIMATORS FROM OTHER LIKELIHOODS

Tsai points out that ° parameters sometimes are in focus,and we agree. Our FIC and FMA apparatus is nicely ableto handle this, because it covers all smooth ¹.µ;° / parame-ters; Tsai appears to claim otherwise. With focus on °j we� nd ! D ¡ej , with ej D .0; : : : ;1; : : : ; 0/t being the j th unitvector. We are free to form general model average estimatorsb°j D

PS c.SjDn/b°j;S , where, incidentally, terms with S not

touching j will be equal to 0. Using FMA’s Theorem 4.1, we� ndp

nfb°j ¡ .°0;j C ±j =p

n /g d! b±j .D/ ¡ ±j for j D 1; : : : ; q;

and so on. The FIC can also be applied, and one may studysimultaneousestimation of the full ° vector. It is also natural toinclude the goodness-of-� t measure

b± t bK¡1b± D n.b° ¡ °0/t bK¡1.b° ¡ °0/

in the data analysis. It is a  2q .±tK¡1±/ in the limit.

There might be situations where the ordinary likelihood ap-paratus cannot be used, or can be expected to perform poorly,and where variations such as pro� le likelihoods,empirical like-lihoods, and quasi-likelihoodsmay be helpful. This would callfor extensions of our work. We do not think, however, that pro-� ling is necessary, or that it leads to new results, inside our para-metric f .y; µ; ° / framework. We are therefore puzzled withTsai’s elaborations in this regard; under weak conditions theS-model pro� le likelihoodestimator of ¹ will simply be our oldmaximum likelihoodestimator b¹S . Tsai’s intricate de� nition ofa new “random parameter” ¹prof;true does not correspond to ourmore naturally de� ned ¹true.

In Hjort and Claeskens (2003) we report on extensions ofour FIC and FMA work for model selection and model aver-aging inside the semiparametric Cox regression model. Focusparameters could take the form ¹.¯;H;z/, involving the para-metric as well as the nonparametric part of the model, as withthe median time to survival for a patientwith given covariates z.Our existing theory will be seen to go through without essen-tial modi� cations as long as ¹ is a function of ¯ and covariatesonly, whereas such modi� cations are called for when it also in-volves H . Similarly, extensions may be envisaged for use inspatial models with covariates, inside particular formats of pa-rameter estimation; in particular, we have in mind the pseudo-likelihood method of Besag (1974, 1977) for Markov random� elds, the quasi-likelihoodof Hjort and Omre (1994, sec. 3) forspatial correlation models; and various methods for observedand aggregated point processes reviewed in Richardson (2003).

8. SUPPLEMENTARY COMMENTS

8.1 When p and q Become Large

Our methodology has been developed under the classicasymptotics scenario where the number of parameters staysbounded when the sample size increases. Shen and Doughertypoint out (in their Section 4) that the results might need modi-� cations to apply when p C q is large, as will happen in manypotential applications.We agree. This needs further mathemati-cal developments.We do believe, however, that our asymptoticsresults will continue to provide adequate descriptions and ap-proximationseven when p Cq grows with n, but slowly enough

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 8: Rejoinder

944 Journal of the American Statistical Association, December 2003

to have p C q D o.p

n /. Establishing such results would needfurther work, but might use methods similar to those used in,for example, Portnoy (1988).

We take the opportunity to opine that if p C q becomes toolarge, it should be reduced. If one has 1,000 covariates per pa-tient, one does good to compress and synthesize these, usingsubstantive prior knowledge along with statistical techniques,before throwing the dataset to a regression selector or averager.Also, methods such as principal components and partial leastsquares regression might easily perform better than subset � nd-ing schemes.

8.2 Loss Functions and Aspects of Costs

Cook and Li point out that using limiting mean squared errorwill not always suf� ce for making the relevant conclusions, re-garding, for example, model selection; see also the commentsby Shen and Dougherty. In some cases there is a cost k.S/ asso-ciated with observing future data for regressors in index set S.With loss functions that suitably combine precision with cost,such as n.b¹ ¡ ¹/2 C ®k.S/, we would have

E lossn.S/ D nE.b¹S ¡ ¹true/2 C ®k.S/

! E32S C ®k.S/:

This might be estimated using a slight extension of the FIC,after which an optimal subset may be extracted.

We have favored limiting mean squared error as performancecriterion, but might also have worked, for example, with L1

loss, leading, however, to more complicated expressions andestimators for Ej3S j and so on.

8.3 Handling Corner Parameters

Shen and Dougherty discuss a general four-parameter modelwhere rate measurements are of the form V .x1; x2 j ¯1;¯2;

¯3;¯4/ plus observation error, with

V D¯1x1

¯2.1 C ¯3x2/ C x1.1 C ¯4x2/:

The case of .¯3;¯4/ D .0;0/ is the so-called Victor–Michaelis–Menten model for enzyme-mediated reactions. In � sheriesresearch it is also well known as the spawner–recruit model,dating back to an in� uential article by Beverton and Holt(1957); see Gavaris and Ianelli (2002) and the engaging dis-cussion in Smith (1994, chap. 8). Shen and Dougherty discussaspects of modeling the V , in particular, looking at the fourpossibilities in–in, in–out, out–in, out–out for .¯3;¯4/. Thiscannot be studied well without a clearer understanding of theerror structure involved. That this is nontrivial and vital, andwill vary widely with context, is clear from Ruppert, Cressie,and Carroll (1989). Shen and Dougherty allude to pretest meth-ods, which decide on inclusion or exclusion of ¯3 and ¯4 onthe basis of tests for their presence. We note that such schemesare again model average methods and fall inside our developedtheory.

There might sometimes be situations where it is known apriori that, for example, ¯3 ¸ 0, ¯4 ¸ 0. The theory we havedeveloped presupposes that .¯3;¯4/ is an inner point of theparameter space. To handle “corner problems,” as here, oneneeds somewhat more intricate methods, which would depend

more on the speci� cs of the problem. See Hjort (1994b) for onesuch example, concernedwith compromiseestimators when thet family is used as an extension of the normal in, for example,regression settings. Similar problems emerge in models withvariance components. The methods of Vu and Zhou (1997) ap-pear relevant when attempting to generalize our results to cor-ner parameters.

An opinion perhaps too rarely expressed, which we share,is that statisticians should be more eager to help develop goodnonlinearregression models, as here. The comfort and ease withwhich we reach moderately adequate approximationsand infer-ence precision using the � exible machinery of (generalized) lin-ear models may sometimes take the edge off of our professionalmodeling creativity.

8.4 Nonnested Models

We have for the most part stayed inside a framework wherethe biggest model is thought to be correct. Cook and Li men-tion the problem of nonnested models. The simplest answer,perhaps, from a principled point of view, is that one mightsearch for a bigger model formulation that encompasses both.Consider estimating the median, for example, including underview both the gamma and the log-normalmodels. One may thenwork with estimators of the type b¹ D Wb¹gam C .1 ¡ W/b¹logn ,with weights somehow dictated by the data, for example, viagoodness-of-� t measures, or via closeness of the two estimatesinvolved to the nonparametric b¹nonpm , that is, the sample me-dian. Behavior and performance may be studied using ourmethods.

There are examples in science where nonnested and some-how con� icting statistical theories are not easily resolved, ofcourse. A controversy of some fame inside � sheries research,and that has perhaps not yet been solved to satisfaction de-spite having been pondered over for about a hundred years,is the Dannevig versus Hjort case. It is concerned with mod-els for spawning, recruitment, migration, and development of� sh populations. Dannevig essentially believed in a determin-istic relationship between the number of recruits and the num-ber of yolk-sack cod� sh larvae, whereas Hjort argued that it isthe environmental conditions during the critical phases of de-velopment that play the more important roles. He was able todevelop year–class assessment methods, collect relevant data,and utilize actuarial mathematical methods of the time to sub-stantiate and re� ne his theories; cf., for example, Hjort (1914).See http://www.math.ntnu.no/»ingeol/bemata/, where a studyof structured stochastic models has been launched, involvingcomputer-intensiveinference in biologicalmarine systems, andthe interesting discussion in Smith (1994) and Secor (2002).This may be an example where model averaging might be use-ful, in a non-nested setup, mixing predictions of, for example,next season’s abundance(perhaps as a function of quota thresh-olds) using elements of both scienti� c models.

8.5 Interpreting FIC Numbers

The FIC scores have been developed as estimates of n timesmean squared error of subset estimators (modulo an additiveconstant) and, as such, depend on the scale used. They may bemade scale independentvia, say,

FIC¤.S/ D dFIC.S/=b! t bKb!;

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014

Page 9: Rejoinder

Hjort and Claeskens: Rejoinder 945

as in FMA’s Section 5.3. This would make comparison and in-terpretation easier across applications.We would, in particular,have

FIC¤.full/ D 2

and

FIC¤.narrow/ D nfb! t.b°full ¡ °0/g2=b! t bKb!:

8.6 When Is ! Equal to 0?

We have seen that the behavior of model average esti-mators is critically determined by ! D J10J ¡1

00@¹@µ

¡ @¹@° . In

particular, if ! D 0, then all subset and model average es-timators are asymptotically equivalent to the narrow modelestimator;

pn.b¹ ¡ ¹true/ !d N.0; ¿ 2

0 / for all reasonable com-petitors. The typical situation leading to ! D 0 is when the pa-rameter does not depend on ° and, in addition, µ and ° areorthogonal parameters in the sense that their full model esti-mators are independent in the limit, that is, J01 D 0. Johnsonasks whether ! may be 0 also in other situations. Here is oneexample, in the framework of the exponential-within-Weibullmodel of FMA’s Section 4.4. Assume we wish to estimate the® quantile ¹ D º1=° =µ , where º D ¡ log.1 ¡ ®/. Then calcu-lations give ! D .º=µ/f¡.1 ¡ r/ C logºg. For estimating the® D :7826 quantile, therefore, ! happens to be equal to 0.

ADDITIONAL REFERENCES

Barndorff-Nielsen, O. E., and Cox, D. R. (1994), Inference and Asymptotics,London: Chapman & Hall.

Besag, J. (1974), “Spatial Interaction and the Statistical Analysis of LatticeSystems” (with discussion), Journal of the Royal Statistical Society, Ser. B,36, 195–225.

(1977), “Ef� ciency of Pseudolikelihood Estimation for SimpleGaussian Fields,” Biometrika, 82, 616–618.

Beverton, R. J. H., and Holt, S. J. (1957), “On the Dynamics of Exploited FishPopulations,” Fisheries Investigations, Ser. 2, 19.

Birgé, L., and Massart, P. (2001), “Gaussian Model Selection,” Journal of theEuropean Mathematical Society, 3, 203–268.

Blaker, H. (2000), “Minimax Estimation in Linear Regression Under Restric-tions,” Journal of Statistical Planning and Inference, 90, 35–55.

Brown, L. D. (2000), “An Essay on Statistical Decision Theory,” Journal ofthe American Statistical Association, 95, 1277–1281. [Also in Brown, L. D.(2002), Statistics in the 21st Century, eds. A. E. Raftery, M. A. Tanner, andM. T. Wells, London: Chapman & Hall/CRC Press.]

Claeskens, G., and Hjort, N. L. (2003), “Goodness of Fit via NonparametricLikelihood Ratios,” unpublished manuscript.

Clyde, M., and George, E. (2003), “Model Uncertainty,” ISDS Technical Report03-17, Duke University.

Dukic, V. D., and Peña, E. A. (2003), “Estimation After Model Selection in aGaussian Model,” Journal of the American Statistical Association, to appear.

Gavaris, S., and Ianelli, S. N. (2002), “Statistical Issues in Fisheries’ StockAssessments” (with discussion), Scandinavian Journal of Statistics, 29,245–271.

Helland, I. S. (1990), “Partial Least Squares Regression and Statistical Models.”Scandinavian Journal of Statistics, 17, 97–114.

Hjort, J. (1914), “Fluctuations in the Great Fisheries of Northern Europe,”Rapports et Procès-Verbaux des Réunions du Conseil International pourl’Exploration de la Mer, 20, 1–228.

Hjort, N. L. (1994a), “Should the Olympic Sprint Skaters Run the 500 MeterTwice?” research report, Dept. of Mathematics, University of Oslo.

(1994b), “The Exact Amount of t -ness That the Normal Model CanTolerate,” Journal of the American Statistical Association, 89, 665–675.

Hjort, N. L., and Claeskens, G. (2003), “Model Averaging and Focussed ModelSelection for Cox Regression,” unpublished manuscript.

Hjort, N. L., and Omre, H. (1994), “Topics in Spatial Statistics” (with discus-sion), Scandinavian Journal of Statistics, 21, 289–357.

Hjort, N. L., and Rosa, D. (1999), “Who Won?” Speedskating World, 4, 15–18.Mardia, K., Kent, J. T., and Bibby, J. M. (1979), Multivariate Analysis, London:

Academic Press.McCullagh, P. (2002), “What Is a Statistical Model?” (with discussion), The

Annals of Statistics, 30, 1225–1308.Portnoy, S. (1988), “Asymptotic Behavior of Likelihood Methods for Exponen-

tial Families When the Number of Parameters Tends to In� nity,” The Annalsof Statistics, 16, 356–366.

Richardson, S. (2003), “Spatial Models in Epidemiological Applications”(with discussion), in Highly Structured Stochastic Systems, eds. P. J. Green,N. L. Hjort, and S. Richardson, London: Oxford University Press,pp. 237–269.

Ruppert, D., Cressie, N., and Carroll, R. J. (1989), “A Transformation/Weighting Model for Estimating Michaelis–Menten Parameters,” Biometrics,45, 637–656.

Secor, D. H. (2002), “Historical Roots of the Migration Triangle,” ICES MarineScience Symposia, 215, 329–335.

Smith, T. (1994), Scaling Fisheries: The Science of Measuring the Effects ofFishing, 1855–1955, Cambridge, U.K.: Cambridge University Press.

van der Vaart, A. (1998), Asymptotic Statistics, Cambridge, U.K.: CambridgeUniversity Press.

Vu, H. T. V., and Zhou, S. (1997), “Generalization of Likelihood Ratio TestsUnder Nonstandard Conditions,” The Annals of Statistics, 25, 897–916.

Dow

nloa

ded

by [

Que

en M

ary,

Uni

vers

ity o

f L

ondo

n] a

t 18:

31 0

6 O

ctob

er 2

014