Polit ln ch09

30
Copyright ©2010 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Statistics and Data Analysis for Nursing Research, Second Edition Denise F. Polit Statistics and Data Analysis for Nursing Research Second Edition CHAPTER Correlation and Simple Regression 9

Transcript of Polit ln ch09

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Statistics and Data Analysisfor Nursing Research

Second Edition

CHAPTER

Correlation and Simple Regression

9

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Pearson’s r

• A descriptive index that summarizes magnitude and nature (direction) of a relationship between two variables in a sample

• Can also be used to make inferences about relationships in the population

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Basic Hypotheses

• Correlational hypotheses are about ρ (rho), the population correlation coefficient

• Basic null hypothesis: rho is zero– H0: ρ = .00

• The alternative (nondirectional) hypothesis is the opposite:– H1: ρ ≠ .00

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Sampling Distribution

• The mean of a sampling distribution of the correlation coefficient is ρ, the population coefficient

• When the null hypothesis is true (when ρ = .00):– The theoretical sampling distribution is

centered on .00– The sampling distribution is approximately

normal

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Assumptions and Requirements

• Pearson’s r is suitable for (a) interval- and ratio-level variables (b) detecting linear relationships

• Pearson’s r can be used inferentially:– If the variables have an underlying distribution that is

bivariate normal (scores on X normally distributed for each value of Y)

– If values on both variables are homoscedastic (for each value of X, variability of Y scores about the same)

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Testing Significance

• Value of a computed r must be compared to critical values in a table for which degrees of freedom is known and a significance criterion (α) is established

• For Pearson’s r:

df = N - 2

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

r-to-z Transformation

• Testing differences between two correlations requires that the two correlation coefficients be transformed: The r-to-z transformation

• The normal distribution can then be used, using appropriate formulas

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Magnitude of Effect

• Pearson’s r provides direct information about the direction and magnitude of effects

• Pearson’s r can be directly used as the effect size index in meta-analysis

• But the magnitude of effect is more often presented as r2

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Coefficient of Determination

• r2 is sometimes called the coefficient of determination

• r2 indicates the proportion of variability in one variable shared with or “explained by” variability in the other

• r2 is analogous to eta2: It represents the ratio of explained variance to total variance: – r2 = SSExplained ÷ SSTotal

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Precision and Pearson’s r

• Confidence intervals can be built around the value of r to indicate the precision of the population estimate

• For example, with a sample of 50, the 95% CI around r = .26 is -.02 to .50– This includes the possibility that the

population correlation is zero—the null hypothesis

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Power and Pearson’s r

• In power analysis, r = effect size index• Tables can be used to estimate sample

size needs (to minimize the risk of a Type II error)

• As a last resort, small, medium, and large effects correspond to rs of .10, .30, and .50, respectively– This corresponds to needed Ns of 785, 85, and

29 participants

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Factors Affecting r

• The magnitude of r can be affected (often reduced) by:– Having variables with restricted ranges of

values– Using groups at both extremes of a distribution

of values– Having outliers in the data– Measuring the variables with instruments

having low reliability (attenuation)

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Nonparametric Correlations

• Nonparametric options can be used if the data are ordinal or if assumptions for Pearson’s r are seriously violated

• Spearman’s rho (rs): Based on ranks of the original data values

• Kendall’s tau (τ): A complex formula, sometimes a preferred index because of its statistical properties

• Both range from -1.00 through .00 to 1.00

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS and Correlations

• Analyze Correlate Bivariate

• Move all variables of interest into Variables slot

• Select Pearson’s r, Kendall’s tau, and/ or Spearman’s rho

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Regression

• Regression: Techniques used to analyze relationships between variables and to make predictions about values of variables

• Strong link between correlation and regression

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Simple Linear Regression

• Simple linear regression involves regressing one variable (Y) on another variable (X)

• Y: The dependent (outcome) variable• X: The independent variable, but often

called the predictor variable in regression analysis

– The goal is to be able to predict new values of Y based on values of X

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Linear Regression

• Linear regression builds on the equation for a straight line because the relationship between the two variables is assumed to be linear

– A straight line should yield the best “fit” of the data points in a scatterplot (a linear model)

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Equation for a Straight Line

• Any straight line can be described by this equation:

Y = a + bX• Y = Values on one variable• X = Values on the other variable• a = The intercept constant (the point at

which the line crosses the vertical (Y) axis

• b = The slope of the line

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Regression Equation

• Identifies the straight line that runs through the scatterplot data with the best possible fit

Y’ = a + bX• Y’ = Predicted value of Y• X = Actual value of X• a = Intercept constant • b = The slope of the line, but in this context

is called the regression coefficient

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Solving for a and b

• The values of the intercept constant (a) and regression coefficient (b) in the regression equation are calculated using formulas that involve:

– Means– Deviations from means

– Cross products of deviations of X and Y scores from their respective means

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Illustration

• Textbook example: Predicting students’ final exam scores based on midterm scores:

Midterm scores: 2, 6, 5, 9, 7, 9, 3, 4, 1, 4Final scores: 3, 7, 6, 8, 9, 10, 4, 6, 2, 5

• r = .955• Regression equation:

Y’ = 1.5 + .90X

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Graphic Representation

• The intercept constant crosses the Y axis at a = 1.5

• The slope is such that for every 10 points on the X axis, you go up 9 on the Y axis

b = .90

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Prediction and Regression

• Regression equations yield predictions of new values of Y based on known values of X

• E.g., for the equation, Y’ = 1.5 + .90X:

X Actual Y Predicted Y

1 2 2.4

5 6 6.0

9 10 9.6

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Errors of Prediction

• Errors of prediction: Differences between actual and predicted values of Y: – Symbolized as e– Also called residuals

X Actual Y Predicted Y e1 2 2.4 -0.45 6 6.0 0.09 10 9.6 0.4

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Least Squares

• The regression equation uses a least-squares criterion in solving for a and b

• The squares of the errors of prediction (e2) are minimized

• Standard regression sometimes called ordinary least-squares (OLS) regression for this reason

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Standard Error of Estimate

• Standard error of estimate: An index of how “wrong,” on average, a predicted value of Y is

• The larger the correlation coefficient between the two variables in the regression, the smaller the SEEstimate

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

Proportion of Variability in Y

• As a proportion of all variability in Y scores, the squared residuals are “what is left” to be explained (residual variation), after the correlation between the two variables is taken into account:

Σe 2

Total variability in Y = 1 – r2

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS and Regression

• Analyze Regression Linear

• Commands will be explained in the next chapter

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Output: Model Summary

Model R R Square Adjusted R Square

Standard Error of

Estimate

1 .955 .912 .901 .81

• SPSS calculates an adjusted R square using a formula that adjusts for sample size and number of predictors

Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit

SPSS Output: Coefficients

Model Unstandardized Coefficients

Standard-ized Coeffi-

cients t Sig.

95% Confidence Interval for B

B Std. Error

Beta Lower Bound

Upper Bound

Constant 1.515 .556 2.727 .026 .234 2.796

Midterm .897 .099 .955 9.106 .000 .670 1.124

Dependent variable: Final exam scores• The information in the column “unstandardized coefficients” embodies the regression equation:a (constant) = 1.515 and b (slope) = .897