Polit ln ch09
-
Upload
stanbridge -
Category
Documents
-
view
61 -
download
1
Transcript of Polit ln ch09
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Statistics and Data Analysisfor Nursing Research
Second Edition
CHAPTER
Correlation and Simple Regression
9
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Pearson’s r
• A descriptive index that summarizes magnitude and nature (direction) of a relationship between two variables in a sample
• Can also be used to make inferences about relationships in the population
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Basic Hypotheses
• Correlational hypotheses are about ρ (rho), the population correlation coefficient
• Basic null hypothesis: rho is zero– H0: ρ = .00
• The alternative (nondirectional) hypothesis is the opposite:– H1: ρ ≠ .00
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Sampling Distribution
• The mean of a sampling distribution of the correlation coefficient is ρ, the population coefficient
• When the null hypothesis is true (when ρ = .00):– The theoretical sampling distribution is
centered on .00– The sampling distribution is approximately
normal
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Assumptions and Requirements
• Pearson’s r is suitable for (a) interval- and ratio-level variables (b) detecting linear relationships
• Pearson’s r can be used inferentially:– If the variables have an underlying distribution that is
bivariate normal (scores on X normally distributed for each value of Y)
– If values on both variables are homoscedastic (for each value of X, variability of Y scores about the same)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Testing Significance
• Value of a computed r must be compared to critical values in a table for which degrees of freedom is known and a significance criterion (α) is established
• For Pearson’s r:
df = N - 2
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
r-to-z Transformation
• Testing differences between two correlations requires that the two correlation coefficients be transformed: The r-to-z transformation
• The normal distribution can then be used, using appropriate formulas
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Magnitude of Effect
• Pearson’s r provides direct information about the direction and magnitude of effects
• Pearson’s r can be directly used as the effect size index in meta-analysis
• But the magnitude of effect is more often presented as r2
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Coefficient of Determination
• r2 is sometimes called the coefficient of determination
• r2 indicates the proportion of variability in one variable shared with or “explained by” variability in the other
• r2 is analogous to eta2: It represents the ratio of explained variance to total variance: – r2 = SSExplained ÷ SSTotal
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Precision and Pearson’s r
• Confidence intervals can be built around the value of r to indicate the precision of the population estimate
• For example, with a sample of 50, the 95% CI around r = .26 is -.02 to .50– This includes the possibility that the
population correlation is zero—the null hypothesis
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Power and Pearson’s r
• In power analysis, r = effect size index• Tables can be used to estimate sample
size needs (to minimize the risk of a Type II error)
• As a last resort, small, medium, and large effects correspond to rs of .10, .30, and .50, respectively– This corresponds to needed Ns of 785, 85, and
29 participants
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Factors Affecting r
• The magnitude of r can be affected (often reduced) by:– Having variables with restricted ranges of
values– Using groups at both extremes of a distribution
of values– Having outliers in the data– Measuring the variables with instruments
having low reliability (attenuation)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Nonparametric Correlations
• Nonparametric options can be used if the data are ordinal or if assumptions for Pearson’s r are seriously violated
• Spearman’s rho (rs): Based on ranks of the original data values
• Kendall’s tau (τ): A complex formula, sometimes a preferred index because of its statistical properties
• Both range from -1.00 through .00 to 1.00
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS and Correlations
• Analyze Correlate Bivariate
• Move all variables of interest into Variables slot
• Select Pearson’s r, Kendall’s tau, and/ or Spearman’s rho
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Regression
• Regression: Techniques used to analyze relationships between variables and to make predictions about values of variables
• Strong link between correlation and regression
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Simple Linear Regression
• Simple linear regression involves regressing one variable (Y) on another variable (X)
• Y: The dependent (outcome) variable• X: The independent variable, but often
called the predictor variable in regression analysis
– The goal is to be able to predict new values of Y based on values of X
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Linear Regression
• Linear regression builds on the equation for a straight line because the relationship between the two variables is assumed to be linear
– A straight line should yield the best “fit” of the data points in a scatterplot (a linear model)
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Equation for a Straight Line
• Any straight line can be described by this equation:
Y = a + bX• Y = Values on one variable• X = Values on the other variable• a = The intercept constant (the point at
which the line crosses the vertical (Y) axis
• b = The slope of the line
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Regression Equation
• Identifies the straight line that runs through the scatterplot data with the best possible fit
Y’ = a + bX• Y’ = Predicted value of Y• X = Actual value of X• a = Intercept constant • b = The slope of the line, but in this context
is called the regression coefficient
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Solving for a and b
• The values of the intercept constant (a) and regression coefficient (b) in the regression equation are calculated using formulas that involve:
– Means– Deviations from means
– Cross products of deviations of X and Y scores from their respective means
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Illustration
• Textbook example: Predicting students’ final exam scores based on midterm scores:
Midterm scores: 2, 6, 5, 9, 7, 9, 3, 4, 1, 4Final scores: 3, 7, 6, 8, 9, 10, 4, 6, 2, 5
• r = .955• Regression equation:
Y’ = 1.5 + .90X
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Graphic Representation
• The intercept constant crosses the Y axis at a = 1.5
• The slope is such that for every 10 points on the X axis, you go up 9 on the Y axis
b = .90
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Prediction and Regression
• Regression equations yield predictions of new values of Y based on known values of X
• E.g., for the equation, Y’ = 1.5 + .90X:
X Actual Y Predicted Y
1 2 2.4
5 6 6.0
9 10 9.6
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Errors of Prediction
• Errors of prediction: Differences between actual and predicted values of Y: – Symbolized as e– Also called residuals
X Actual Y Predicted Y e1 2 2.4 -0.45 6 6.0 0.09 10 9.6 0.4
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Least Squares
• The regression equation uses a least-squares criterion in solving for a and b
• The squares of the errors of prediction (e2) are minimized
• Standard regression sometimes called ordinary least-squares (OLS) regression for this reason
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Standard Error of Estimate
• Standard error of estimate: An index of how “wrong,” on average, a predicted value of Y is
• The larger the correlation coefficient between the two variables in the regression, the smaller the SEEstimate
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
Proportion of Variability in Y
• As a proportion of all variability in Y scores, the squared residuals are “what is left” to be explained (residual variation), after the correlation between the two variables is taken into account:
Σe 2
Total variability in Y = 1 – r2
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS and Regression
• Analyze Regression Linear
• Commands will be explained in the next chapter
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Output: Model Summary
Model R R Square Adjusted R Square
Standard Error of
Estimate
1 .955 .912 .901 .81
• SPSS calculates an adjusted R square using a formula that adjusts for sample size and number of predictors
Copyright ©2010 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Statistics and Data Analysis for Nursing Research, Second EditionDenise F. Polit
SPSS Output: Coefficients
Model Unstandardized Coefficients
Standard-ized Coeffi-
cients t Sig.
95% Confidence Interval for B
B Std. Error
Beta Lower Bound
Upper Bound
Constant 1.515 .556 2.727 .026 .234 2.796
Midterm .897 .099 .955 9.106 .000 .670 1.124
Dependent variable: Final exam scores• The information in the column “unstandardized coefficients” embodies the regression equation:a (constant) = 1.515 and b (slope) = .897