Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation...
Transcript of Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation...
Ø Scatterplots
Ø Covariance,!Correlation
ØOutliers
Ø Linear!Regression
ØResiduals,!Variation!in!the!Model
Ø Lurking!Variables!and!Causation
Relationship Between Two Quantitative Variables
Lecture!5Sections!4.1!� 4.11
Scatterplot
• Scatterplot: graphical!display!of!the!relationship!between!two!quantitative!variables• Response Variable: variable!plotted!along!y-axis!that!we!are!trying!to!explain!or!predict
• Predictor Variable: variable!plotted!along!x-axis!that!we!are!using!to!explain!changes!about!the!response!variable
• Observations!plotted!as!ordered!pairs
Example: Response vs. Predictor
• Scenario:Want!to!know!if!a!student�s!verbal!SAT!score!gives!any!information!about!their!math!SAT!score!by!randomly!sampling!44!high!school!seniors
• Question: What!should!be!the!response!variable?!!What!should!be!the!predictor!variable?
• Answer:• Response: _________________________
• Predictor: _________________________
Describing a Scatterplot
• Direction: As!predictor!variable!increases�• Positive: response!tends!to!increase
• Negative: response!tends!to!decrease
• Neither: no!obvious!change!in!response
• Form:• Linear: response!tends!to!increase!at!about!the!same!rate!across!all!values!of!predictor
• Curved: rate!at!which!response!changes!depends!on!value!of!predictor
• No Pattern
• Strength: How!tightly!clustered!together!are!the!points?• Usually!described!as!strong,moderate,!or!weak.
Example: Describing a Scatterplot
• Scenario:Want!to!know!if!a!student�s!verbal!SAT!score!gives!any!information!about!their!math!SAT!score!by!randomly!sampling!44!high!school!seniors
• Question: How!should!we!describe!the!relationship?
• Answer: ______________________________• Math!scores!change!at!______________________________________________________________
• Math!scores!tend!to!_____________________!____________________________________________
Example: Describing a Scatterplot
• Scenario: Supermarket!wants!to!know!how!much!they!should!sell!a!gallon!of!milk!for.!!Change!price!every!day!for!3!weeks!and!record!number!of!units!sold.
• Question: How!should!we!describe!the!relationship?
• Answer: ______________________________• Sales!tend!to!_______________________________
• Sales!decline!_______________________________at!lower!values!than!at!higher!values• Not!________________________
Covariance
• Covariance: measure!of!joint!variability!between!two!quantitative!variables
!"# =$% & '$ (% & )( +*+ $, & '$ (, & )(
- & 1
• Sign!of!covariance!dictates!direction!of!relationship
• Covariance!is!unbounded:!values!range!from!&. to!.
• Problem:!Does!not!help!us!interpret!strength!of!relationship
Example: Covariance
• Scenario:Math!vs.!verbal!SAT!scores!on!left.!Comparison!of!sample!of!students�!midterm!and!final!exam!scores!on!right
• Question: Which!scatterplot!has!the!stronger!linear!relationship?
• Answer: _________________________________________________________• Points!are!more!_______________________________________________________
Example: Covariance
• Scenario:Math!vs.!verbal!SAT!scores!on!left.!Comparison!of!sample!of!students�!midterm!and!final!exam!scores!on!right
• Question: What!does!the!covariance!tell!us?
• Answer: __________________________________• Covariance!will!be!large!if!the!_______________!of!the!observations!are!large!regardless!of!how!______________________________________________
!"# = __________
!"# = ______
Correlation
• Correlation: measure!of!the!strength!and!direction!of!the!linear!relationship!between!two!quantitative!variables
• Two!Types:• Population Correlation: Denoted!by!/ (Greek!letter!�rho�)• Parameter!à Generally!unknown
• Sample Correlation:!Denoted!by!0• Statistic!à Good!approximation!of /
0 =!"#
!"!#
Sample!covariance
Standard!deviation!of!predictor!values Standard!deviation!of!response!values
Correlation Facts
• Bounded!between!-1!and!+1• Sign!dictates!direction!of!relationship!(positive or negative)
• Magnitude!dictates!strength!of!relationship• Rule of Thumb: If the magnitude is…
• Between 0.70 and 1.00, the strength of the relationship is strong.
• Between 0.40 and 0.70, the strength of the relationship is moderate.
• Between 0.10 and 0.40, the strength of the relationship is weak.
• Between 0.00 and 0.10, there is little to no relationship.
• Scatterplot!must!have!a!linear form!for!the!correlation!to!make!sense.• As!the!scatterplot!becomes!more!curved,!the!correlation!becomes!less!accurate!as!a!means!of!describing!the!relationship.
Example: Calculating Correlation
• Scenario: Comparing!height!and!weight!of!15!adult!males.
• Question: What!is!the!correlation!between!height!and!weight?
• Answer: __________________________________
• Question: How!should!we!describe!the!linear!relationship?
• Answer: __________________________________
Variable Mean Std. Dev. Covariance
Height (In.) 68.66 1.76 12.92
Weight (Lbs.) 192.91 16.55
Using Excel
Example: Approximating Correlations
• Scenario: Scatterplots!show!observations!with!predictors!from!1!to!25,!responses!from!0!to!100.
• Task: Sort!the!scatterplots!from!weakest!correlation!to!strongest.
• Answer: __________________
• Question: Approximately!what!are!the!correlations!displayed!in!each!scatterplot?
• Answers:
A. B. C.
0 = _____ 0 = _____ 0 = ______
Outliers
• Outlier: point!on!a!scatterplot!that!is!unusually!far!away!from!the!rest!of!the!data• Have!potential!to!drastically!change!value!and/or!direction!of!correlation
• If!an!outlier!exists,!report!correlation!with!and!without!it!included!in!the!calculation
Example: Outliers
• Scenario: Scatterplot!shows!final!scores!of!1st round!games!between!1!and!16!seed!in!NCAA!Tournament!the!last!2!years.
• Question: Which!observation!is!an!outlier?
• Answer: ___________________!à 1!seed:!____;!16!seed:!____• First!time!a!_____________________________________________
• Question: What!impact!does!this!point!have!on!the!correlation?
• Answer: Significantly!__________________• With: 0 = ________; Without: 0 = ________
• Takeaway: Outliers!can!______________________!what!appears!to!be!an!otherwise__________________________.
Virginia vs. UMBC
• Least Squares Regression Line: the!best!line!that!can!be!drawn!through!data!on!a!scatterplot• Summarizes!the!general!pattern!to!help!us!understand!how!the!variables!are!related
• Allow!us!to!make!predictions!about!the!response!(Y)!given!some!value!of!the!predictor!(X)
• Almost!never!perfect,!but!gives!us!a!good!idea!of!how!the!response!changes!as!the!predictor!changes
Least!Squares!Regression!Line
Least Squares Regression Line
Example: Preparing for Linear Regression
Necessary!for!finding!equation!of!regression!line
Not!necessary!for!finding!regression!line,!but!help!us!understand!scope!of!data
Least Squares Regression Line
• Least Squares Regression Line:
2( = 34 + 3%$
• Slope: 3% = 056
57
• Estimated!increase!in!prediction!given!one!unit!increase!in!the!predictor
• Intercept: 34 = )( & 3% '$• Predicted!value!of!response!variable!when!predictor!equals!0
Predicted!Value!of!Response:�(-hat�
Intercept Slope Value!of!predictor
Example: Calculating Slope and Intercept
• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price
• Question: What!is!the!equation!of!the!regression!line?
• Answer:
• Slope: ______________________________________________
• Intercept: ______________________________________________
• Regression Line: _____________________________
Variable Mean Std. Dev. Correlation
Attractions (X) 30 10.23 .588
Admission Price (Y) 64 21.53
Example: Interpreting Slope and Intercept
• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Equation!of!the!regression!line!is:
2( = 89:;< + 1:8>$
• Question: What!do!the!intercept!and!slope!tell!us?
• Answer:• Intercept: A!park!that!has!_________________!would!have!a!___________________!________________________________
• Slope: For!every!additional!________________________________________,!the!________________________________________________________________.
Warning: Make sure you include the word “predicted” or
“expected”. The regression line only approximates values.
Example: Making Predictions
• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Equation!of!the!regression!line!is:
2( = 89:;< + 1:8>$
• Question: What!would!we!expect!to!pay!to!enter!an!amusement!park!with!25!attractions?
• Answer: _______________________________
__________________
Residuals
• Residual: the!difference!between!the!predicted!value!of!a!response!and!the!observed!value!from!the!data
? = ( & 2(
• Observations!that!are:• Close to!the!regression!line!will!have!small residuals
• Far away from!the!regression!line!will!have!large residuals
• Above the!regression!line!will!have!positive residuals
• Below the!regression!line!will!have!negative residuals
Residual!(Error) Observed!Value!
From!Data
Predicted!Value!From!
Regression!Line
Example: Residuals
• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!The!amusement!park!that!had!25!attractions!had!an!admission!price!of!$66.
• Question: What!is!the!residual?
• Answer: ___________________________________
• Question: What!does!the!residual!mean?
• Answer: Admission!price!is!_______________________!than!what!would!be!______________________________________________.• Observation!lies!___________________________________________
Assumptions in Linear Regression
• Quantitative Data Condition: Regression!only!applies!to!quantitative!data
• Linearity Assumption: The!relationship!between!the!predictor!and!response!is!fairly!linear.
• Outlier Condition: No!individual!point!drastically!changes!the!slope!of!the!regression!line.
• Independence Assumption: Observations!in!the!data!are!selected!independently!of!one!another.
• Equal Spread Condition: The!spread!of!the!observations!around!the!regression!line!is!consistent!across!all!values!of!the!predictor.
Equal Spread Condition
• Residual Plot: plot!of!predictor!values!against!the!corresponding!residual!for!each!observation• Want: Plot!with!no!discernable!patterns!� no!direction!and!no!shape
Equal!Spread!Condition!Satisfied Equal!Spread!Condition!Not!Satisfied
Magnitude of residual
depends on value of predictor
Sign of residual depends
on value of predictor
No patterns. Range of
residuals same across all
predictor values
Example: Residual Plot
• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Residual!plot!shown!below.
• Question: What!does!the!residual!plot!reveal?
• Answer:We!would!expect!the!actual!admission!price!of!an!amusement!park!to!be!_________________________________________________!_____________________________________
• Question: Does!the!equal!spread!condition!appear!to!hold?
• Answer: ________• Residuals!appear!to!_________________!________________________________________!________________________________________
Standard Deviation of Residuals
• Standard Deviation of Residuals: measure!of!the!size!of!a!typical!residual
!@ =A?B
- & 8
• Just!like!the!�regular�!standard!deviation,!residuals!that!are:• Within!one!standard!deviation!of!the!regression!line!are!quite!common
• More!than!three!standard!deviations!from!the!regression!line!are!likely!outliers
Note: The standard deviation of the residuals is rarely (if ever)
calculated by hand. We’ll see how to use Excel later to get this value.
Example: Standard Deviation of Residual
• Scenario: The!admission!price!for!an!amusement!park!with!25!rides!was!$66.!!The!predicted!value!was!$57.80!and!the!standard!deviation!of!the!residual!is!$17.76.
• Question: How!unusual!was!our!observation?
• Answer: __________________________• Actual Residual: ____________
• Standard Deviation of Residual: ____________• Actual!residual!is!_________________!� observed!value!within!___________________________!_____________________________________
Variation in the Linear Model
• Two!extremes:• Perfect Linear Model: regression!line!perfectly!predicts!response!all!the!time;!all!residuals!would!be!zero
• “Useless” Linear Model: predictor!gives!no!information!about!response
•Most!linear!models!fall!somewhere!in!between• Predictor!yields!some!information!about!response,!but!predictions!are!not!perfect!and!result!in!some!residuals
Perfect Useless Typical
Variation in the Linear Model
• R-Squared: the!fraction!of!the!variability!in!the!response!(C)!that!is!explained!by!the!predictor!(D)• Calculated!by!squaring!the!correlation:!0B
• No!relationship!between!variables!if!0B = <
• Observations!in!straight!line!if!0B = 1
• Generally!reported!as!a!percentage
Note: Remainder of variation E1 & 0BF is due to natural fluctuations
of the response in the sample and comes from the residuals.
Example: R-Squared
• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Correlation!is!.588.
• Question: What!is!the!R-squared!value?
• Answer: _____________________________________
• Question: What!does!the!R-squared!mean!in!context?
• Answer: _________!of!the!variability!_____________________!is!explained!by!_________________________________________________________• The!remaining!____________!is!due!to!___________________________________________!caused!by!other!factors!such!as:• ______________________
• ______________________
• ______________________
Using Excel
• To!do!regression!in!Excel,!you!must!install!the!Data!Analysis!ToolPak,!which!is!an!add-in!available!on:• Excel!2007,!2010,!2013,!and!2016!for!PC
• Excel!2016!for!Mac
• To!do!regression!in!Excel:• Choose!�Data!Analysis�!from!�Data�!tab!to!get!the!menu!below
Links!to!videos!are!posted!on!CourseWeb to!help!you!install
Using Excel
Note: Focus on the parts of the output boxed in red. We’ll worry about the rest
once we get to inference later in the semester.
Appropriateness of Regression
• To!determine!if!regression!is!appropriate:• Ensure!variables!are!quantitative and!observations!are!independent
• Check!the!scatterplot!for:
• Linearity
• Outliers
• Calculate!the!equation!of!the!regression!line
• Use!the!regression!line!to!calculate!the!residuals
• Create!the!residual!plot!and!check!equal spread condition
• Report!final!results
Example: Appropriateness of Regression
• Scenario: Comparing!area!of!states!according!to!region!of!the!country!where!1!=!Northeast,!2!=!South,!3!=!Midwest,!4!=!West.!!Equation!of!the!regression!line!is! 2( = &G>H8G9 + GIHJK8$.
• Question: Is!linear!regression!appropriate!to!use?
• Answer: ________• Linear!regression!is!_____________________________________________________________________________• Could!have!assigned!_______________________________
• Slope!has!______________!because!region!___________________
• Should!use!______________________________________
Example: Appropriateness of Regression
• Scenario: Comparing!a!person�s!self-described!happiness!on!a!scale!from!0-15!with!the!number!of!close!friends!they!have
• Question: Is!linear!regression!appropriate!to!use?
• Answer: ________• Residual!plot!_____________________________________
• Happiness!______________________________________________________________________
• Happiness!______________________________________________________
Lurking Variables
• Lurking Variable: a!variable!that!could!potentially!confound!or!confuse!a!relationship!between!two!other!observed!variables• May!seem!as!if!the!predictor!(X)!causes!the!response!(Y)!to!occur
• In!reality,!the!relationship!between!X!and!Y!is!a!result!of!the!lurking!variable!(L)!being!related!to!both
X Y
L
Observed!Relationship
Example: Lurking Variables
• Scenario: Scatterplot!comparing!number!of!firefighters!at!a!fire!with!the!amount!of!damage!caused!(in!dollars).
• Question: Does!having!more!firefighters!cause!more!damage?
• Answer: ____________________
• Question: What!is!the!lurking!variable?
• Answer: ____________________________• _____________________!require!more!help
Firefighters Damage
_________________