Post on 19-Jul-2015
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Statistics and Data Analysisfor Nursing Research
Second Edition
CHAPTER
Multiple Regression
10
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multivariate Statistics
• Multivariate statistics are a class of statistics that involve the analysis of at least three variables
• Multivariate statistics are computationally formidable, yet are an important and powerful tool
• One widely used multivariate statistical tool is multiple regression
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Regression
• Multiple regression is an extension of simple regression that allows more than one predictor variable
• Most outcomes of interest to nurse researchers are multiply determined, so multiple regression is a powerful tool for better understanding relationships among variables
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Regression Equation
• Like simple regression, the multiple regression equation for predicted values of the dependent variable (Y’) involves an intercept constant (a) and regression coefficients (b weights)—one for each predictor (Xs):
Y’ = a + b1X1 + b2X2 + ... bkXk
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Least-Squares Criterion
• Multiple regression solves for a and the b weights using the least-squares criterion—the sum of the squared error terms (residuals) is minimized
• Regression coefficients are weights associated with a given predictor when the other predictors are in the equation – Removal or addition of a predictor changes the
b weights
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Standardized Equation
• Because of differences in measurement units among predictors, regression equations are often presented in standardized form—using z scores (mean = 0, SD = 1.0) rather than raw scores for the predictor variables
• z scores are weighted by standardized coefficients called beta weights (β)
• In standardized form, there is no intercept constant, the intercept is always = 0.0
zY’ = ß1zx1 + ß2zx2 + ... ßkzxk
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Correlation
• The multiple correlation coefficient (R) summarizes how well the independent variables, taken together, predict or “explain” a dependent variable (DV)
• R indicates the magnitude of the relationship among the variables—but not the direction– R can range from .00 to 1.00; there are no negative
values
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Coefficient of Determination
• The most widely reported statistic in multiple regression is the square of R, R2
• R2 indicates the proportion of variance in the DV accounted for by the predictors, taken as a set
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Facts About R
• R cannot be lower than the highest bivariate correlation (r) between predictors and the DV
• Increments to R tend to decline as additional predictors are added, because predictors usually have redundancy—i.e., they “explain” overlapping variance because they themselves are intercorrelated
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Adjustment to R2
• All chance fluctuations for R2 are in the direction of inflating its value, so an adjustment is often made—especially for small samples
• Adjusted R2 (sometimes called shrunken R2) lowers the value, using a formula that takes sample size and number of predictors into account
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Statistical Control
• Multiple regression offers the possibility of statistical control over extraneous (confounding) variables
• Multiple regression coefficients indicate the number of units that the DV is expected to change for each unit change in a predictor when the effects of other predictors are held constant (i.e., controlled)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Partial Correlation
• Partial correlation is a measure of the relationship between a DV and a predictor (X1) while controlling for the effect of a third variable (X2)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Partial Correlation (cont’d)
• In the diagram, the partial correlation of X1
with Y, controlling for X2, is the area a / a + d
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Semipartial Correlation
• Semipartial correlation summarizes the correlation between all of the DV and a predictor (X1), from which the third variable (X2) has been partialled out
• The effect of the extraneous variable is removed from X1 but not from the DV
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Semipartial Correlation (cont’d)
• In the diagram, the semipartial correlation of X1
with Y, partialling out X2, is the area
a / a + b + c + d
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Overall Significance Test
• The basic null hypothesis in multiple regression: R = .00
• The statistic to test for the significance of R is an F-ratio that contrasts sum of squares due to regression against sum of squares for error (residual variation)
F = SSregression/ dfregression
SSresiduals/ dfresiduals
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Test for Added Predictors
• An F-ratio is also computed to test for the significance of changes to R when additional predictors are included in the equation
• The null hypothesis in this situation is that the increment to R is .00
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Test for Individual Predictors
• The significance of individual predictors can be evaluated through t statistics
• The null hypothesis in this case is that the regression coefficients are .00
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Strategies for Entering Predictors
• Predictor variables can be entered into regression equations in various ways
• Different approaches attribute overlapping variation differently
• Three main approaches:– Simultaneous regression– Hierarchical regression– Stepwise regression
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Simultaneous Regression
• Standard method is simultaneous regression, which enters all predictors into the equation simultaneously
• Regression coefficients then indicate the relationship between a predictor and the DV when all other predictors are taken into account
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Simultaneous Regression (cont’d)
• Diagram illustrates how variability in Y is allocated to X1, X2, and X3 (shaded areas l, n, & p)
• Each predictor is assigned the portion of Y’s variability that it contributes uniquely (which equals squared semipartial correlations)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Hierarchical Regression
• Hierarchical regression involves entering predictors into the equation in blocks, in a series of sequential steps
• Order of entry is controlled by the researcher• Method is useful when researchers consider
some variables theoretically or causally prior to others– Also useful when wishing to control one block of
predictors before considering others
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Hierarchical Regression (cont’d)
• Diagram illustrates situation in which variables X1 to X3 are entered in three successive steps
• Step 1: Areas l and m are attributed to X1
• Step 2: Areas n and o are attributed to X2
• Step 3: Only area p is attributed to X3
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Stepwise Regression
• Stepwise regression involves entering predictors into the equation one at a time, in the order in which increments to R are greatest
• Statistical, rather than theoretical, criteria determine the order of entry
• The procedure is controversial—it should be considered exploratory, and should involve cross validation of the model
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Stepwise Regression (cont’d)
• Diagram illustrates stepwise entry in three steps
• Step 1: Areas l & m are attributed to X1
• Step 2: Areas p and o are attributed to X3
• Step 3: Only area n is attributed to X2
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Nature of the Variables in Regression
• Dependent variable: Should be interval or ratio level (or approximately interval)
• Independent variables can be:– Ratio level– Interval level or approximately so– Properly coded nominal level
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Nominal-Level Predictors
• Nominal-level variables typically have to be recoded for use in regression analysis
• Three primary approaches:– Dummy coding– Effect coding– Orthogonal coding
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Nominal-Level Predictors
• All three approaches involve creating c – 1 new variables, where c is the number of categories of the original variable (e.g., four categories for marital status, three new variables are created)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Reference Groups
• All three coding options require that there be one omitted category (c – 1) in the creation of new variables
• The omitted category is the reference group
• The reference group can be selected based on theoretical or conceptual grounds, but it is often the smallest category
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Dummy Coding
• The most widely used method is dummy coding, which contrasts people in one category with everyone else
• Everyone is assigned codes of either 1 or 0 on all the new variables
• Example, marital status, original codes:– 1 = married; 2 = divorced; 3 = widowed; 4 =
single, never married• In our example, assume “singles” (never-
married people) are the reference group
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Example of Dummy Coding
• Variable names are in top row, numbers are the codes for each variable
• MSTAT is the original variable
MSTAT MARR DIVOR WIDOWMarried 1 1 0 0
Divorced 2 0 1 0
Widowed 3 0 0 1
Single 4 0 0 0
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Example of Dummy Coding (cont’d)
• The three new variables (MARR, DIVOR, and WIDOW) could be used as predictors in regression analysis
• Each dummy-coded variable contrasts those in a given category against all those who are not– E.g., The variable MARR contrasts all those
who are married against all those who are not
• In this example, those who are single are defined by having 0s on all three new variables
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Interpreting Dummy Codes
• With dummy-coded variables, the intercept term is the mean value on the DV for the reference group (when no other variables are in the analysis)
• Regression coefficients for a dummy variable represents the difference in the DV between the designated group and the reference group
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Effect Coding
• Effect coding involves using codes of -1 rather than 0 as the contrast for the designated group
• With effect coding, the intercept is the grand mean on the DV; regression coefficients indicate the group’s mean relative to the grand mean, not the mean of the reference group
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Orthogonal Coding
• Orthogonal coding involves using a complex combination of codes to designate planned comparisons (e.g., comparing those who have lost a husband—divorced and widowed—to those who have not)
• Rarely used in nursing research• Unless the author stipulates differently, dummy
coding should be assumed when reading a report
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Interaction Terms
• Interactions between different predictors can be designated in the equation by creating new interaction variables
• Simplest case: Multiplying two dummy-coded variables:– Males = 1 Females = 0– HIV positive = 1 HIV negative = 0– Interaction variable: Males who are HIV positive = 1
All others = 0
• If the interaction term is significant: The effect of HIV status on the DV is conditional upon the person’s sex
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Regression and Precision
• Confidence intervals around R2 can be built and yield useful information
• In practice, CIs around R2 are rarely presented– Perhaps because they are not calculated
within major statistical software packages– Can be done through Internet resources
• Example: R2 = .50, N = 100, k = 8
95% CI = .37 to .63
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Regression and Sample Size
• One method of estimating sample size needs concerns the ratio of predictors to cases in the analysis
• Broad guideline: N should equal 50 + 8 times the number of predictors – For example, with five predictors, there should
be a minimum of 90 participants
• A better approach is power analysis
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Regression and Power Analysis
• Power analysis for multiple regression takes statistical criteria (α and β), estimated effect size, and number of predictors into account
• In the absence of effect size estimates, Cohen’s criteria are:– Small effect, R2 = .02
– Moderate effect, R2 = .13
– Small effect, R2 = .30 E.g., for five predictors with α = .05 and 1-β = .80, the
estimated sample size needs would be 643, 92, and 36, respectively
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Relative Importance of Predictors in Regression
• Identifying which independent variable is the “best” predictor of a DV is a thorny issue because of overlapping variability in predictors
• Comparing b weights sheds no light• There is no ideal solution, but researchers
most often compare:– Beta weights because they are standardized– Squared semipartial correlations because they
identify unique contributions
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Suppression in Multiple Regression
• Suppression is a phenomenon that can occur when a predictor variable obscures, suppresses, or alters a relationship between other predictors and the DV because of overlapping variability
• Can lead to some puzzling results—for example, a variable that has a positive r with the DV could end up with a negative regression coefficient
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multicollinearity
• Multicollinearity is a problem that can occur when predictors are too highly intercorrelated
• Can yield unstable and misleading regression results
• Avoid using two predictors whose correlation is .85 or higher
• Multicollinearity can be tested by computing a tolerance, which ranges from .0 to 1.0– The higher the tolerance, the better; default tolerance for
excluding variable in SPSS = .0001
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Multiple Regression Assumptions
• Multiple regression used inferentially to estimate population values relies on several assumptions
• Multivariate normality—Each variable and all linear combinations of them are assumed to be normally distributed
• Linearity—That there is a straight-line relationship between pairs of variables
• Homoscedasticity—Variability in scores for one variable similar at all values of another
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Regression Assumptions (cont’d)
• Independence of errors—Errors of prediction are assumed to be independent of each other
• Main tool for exploring violations of assumptions: Residual scatterplots that plot errors of prediction on one axis again predicted values of the DV on the other
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Residual Scatterplots
• When assumptions for multiple regression are met, residuals are distributed in an approximate rectangle, with heavy clustering of residuals along a center line—as in this diagram
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Residual Scatterplots (cont’d)
• Residual scatterplot when assumption of multivariate normality is violated
• Distribution of residuals is skewed
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Residual Scatterplots (cont’d)
• Residual scatterplot when assumption of linearity is violated
• Relationship between residuals and predicted values of Y is not linear
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Residual Scatterplots (cont’d)
• Residual scatterplot when assumption of homoscedasticity is violated
• Variation in error terms is not consistent across all values of Y’
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Computers and Multiple Regression
• Researchers always use statistical software for multiple regression
• Software packages allow many options, and produce extensive output
• In SPSS, use Analyze Regression Linear
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Linear Regression Analysis
• Insert the dependent variable
• Then specify the Independents (predictors)
• For simultaneous regression, use Method Enter
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Regression Analysis (cont’d)
• For hierarchical regression, enter variables in different blocks
• Stepwise is another option for Method, using the dropdown menu
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Regression Analysis (cont’d)
• Click Statistics pushbutton on main dialog box to get options for statistical output
• Important options include:– Estimates– Model fit– Part/partial correlations– Collinearity diagnostics
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Model Summary Table
• In SPSS, one main panel is the Model Summary panel• In simultaneous regression, there is only one model—The
regression results when all predictors are in the equation
aPredictors: (Constant), Motivation scores, GRE Quant, Undergrad GPA, GRE Verbal
Model R R Square
Adjusted R Square
Std. Error of Estimate
1 .940a .883 .852 .170
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Model Summary Table (cont’d)
• In hierarchical or stepwise regression, there are multiple models—One for each step in the regression (abbreviated table)
Model R R Square
Adj R Sq
Std Err. of Est.
R Sq Change
Sig. F Change
1 .866 .751 .737 .227 .751 .000
2 .912 .832 .812 .192 .081 .011
3 .937 .877 .854 .169 .045 .027
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Coefficients Table
• For each model, SPSS summarizes the regression equation (abbreviated table)Model 1 Unstandardized
CoefficientsBeta t Sig
b SE
Constant -1.215 .446 -2.727 .016
Undergrd GPA .672 .200 .460 3.364 .004
GRE Verbal .0031 .001 .457 3.189 .006
GRE Quant -.00067 .001 -.113 -.898 .383
Motivation .0117 .005 .268 2.307 .036