Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation...

13
Relationship Between Two Quantitative Variables Scatterplot Scatterplot: Response Variable: Predictor Variable: Example: Response vs. Predictor Scenario: Question: Answer: Response: Predictor:

Transcript of Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation...

Page 1: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Ø Scatterplots

Ø Covariance,!Correlation

ØOutliers

Ø Linear!Regression

ØResiduals,!Variation!in!the!Model

Ø Lurking!Variables!and!Causation

Relationship Between Two Quantitative Variables

Lecture!5Sections!4.1!� 4.11

Scatterplot

• Scatterplot: graphical!display!of!the!relationship!between!two!quantitative!variables• Response Variable: variable!plotted!along!y-axis!that!we!are!trying!to!explain!or!predict

• Predictor Variable: variable!plotted!along!x-axis!that!we!are!using!to!explain!changes!about!the!response!variable

• Observations!plotted!as!ordered!pairs

Example: Response vs. Predictor

• Scenario:Want!to!know!if!a!student�s!verbal!SAT!score!gives!any!information!about!their!math!SAT!score!by!randomly!sampling!44!high!school!seniors

• Question: What!should!be!the!response!variable?!!What!should!be!the!predictor!variable?

• Answer:• Response: _________________________

• Predictor: _________________________

Page 2: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Describing a Scatterplot

• Direction: As!predictor!variable!increases�• Positive: response!tends!to!increase

• Negative: response!tends!to!decrease

• Neither: no!obvious!change!in!response

• Form:• Linear: response!tends!to!increase!at!about!the!same!rate!across!all!values!of!predictor

• Curved: rate!at!which!response!changes!depends!on!value!of!predictor

• No Pattern

• Strength: How!tightly!clustered!together!are!the!points?• Usually!described!as!strong,moderate,!or!weak.

Example: Describing a Scatterplot

• Scenario:Want!to!know!if!a!student�s!verbal!SAT!score!gives!any!information!about!their!math!SAT!score!by!randomly!sampling!44!high!school!seniors

• Question: How!should!we!describe!the!relationship?

• Answer: ______________________________• Math!scores!change!at!______________________________________________________________

• Math!scores!tend!to!_____________________!____________________________________________

Example: Describing a Scatterplot

• Scenario: Supermarket!wants!to!know!how!much!they!should!sell!a!gallon!of!milk!for.!!Change!price!every!day!for!3!weeks!and!record!number!of!units!sold.

• Question: How!should!we!describe!the!relationship?

• Answer: ______________________________• Sales!tend!to!_______________________________

• Sales!decline!_______________________________at!lower!values!than!at!higher!values• Not!________________________

Page 3: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Covariance

• Covariance: measure!of!joint!variability!between!two!quantitative!variables

!"# =$% & '$ (% & )( +*+ $, & '$ (, & )(

- & 1

• Sign!of!covariance!dictates!direction!of!relationship

• Covariance!is!unbounded:!values!range!from!&. to!.

• Problem:!Does!not!help!us!interpret!strength!of!relationship

Example: Covariance

• Scenario:Math!vs.!verbal!SAT!scores!on!left.!Comparison!of!sample!of!students�!midterm!and!final!exam!scores!on!right

• Question: Which!scatterplot!has!the!stronger!linear!relationship?

• Answer: _________________________________________________________• Points!are!more!_______________________________________________________

Example: Covariance

• Scenario:Math!vs.!verbal!SAT!scores!on!left.!Comparison!of!sample!of!students�!midterm!and!final!exam!scores!on!right

• Question: What!does!the!covariance!tell!us?

• Answer: __________________________________• Covariance!will!be!large!if!the!_______________!of!the!observations!are!large!regardless!of!how!______________________________________________

!"# = __________

!"# = ______

Page 4: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Correlation

• Correlation: measure!of!the!strength!and!direction!of!the!linear!relationship!between!two!quantitative!variables

• Two!Types:• Population Correlation: Denoted!by!/ (Greek!letter!�rho�)• Parameter!à Generally!unknown

• Sample Correlation:!Denoted!by!0• Statistic!à Good!approximation!of /

0 =!"#

!"!#

Sample!covariance

Standard!deviation!of!predictor!values Standard!deviation!of!response!values

Correlation Facts

• Bounded!between!-1!and!+1• Sign!dictates!direction!of!relationship!(positive or negative)

• Magnitude!dictates!strength!of!relationship• Rule of Thumb: If the magnitude is…

• Between 0.70 and 1.00, the strength of the relationship is strong.

• Between 0.40 and 0.70, the strength of the relationship is moderate.

• Between 0.10 and 0.40, the strength of the relationship is weak.

• Between 0.00 and 0.10, there is little to no relationship.

• Scatterplot!must!have!a!linear form!for!the!correlation!to!make!sense.• As!the!scatterplot!becomes!more!curved,!the!correlation!becomes!less!accurate!as!a!means!of!describing!the!relationship.

Example: Calculating Correlation

• Scenario: Comparing!height!and!weight!of!15!adult!males.

• Question: What!is!the!correlation!between!height!and!weight?

• Answer: __________________________________

• Question: How!should!we!describe!the!linear!relationship?

• Answer: __________________________________

Variable Mean Std. Dev. Covariance

Height (In.) 68.66 1.76 12.92

Weight (Lbs.) 192.91 16.55

Page 5: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Using Excel

Example: Approximating Correlations

• Scenario: Scatterplots!show!observations!with!predictors!from!1!to!25,!responses!from!0!to!100.

• Task: Sort!the!scatterplots!from!weakest!correlation!to!strongest.

• Answer: __________________

• Question: Approximately!what!are!the!correlations!displayed!in!each!scatterplot?

• Answers:

A. B. C.

0 = _____ 0 = _____ 0 = ______

Outliers

• Outlier: point!on!a!scatterplot!that!is!unusually!far!away!from!the!rest!of!the!data• Have!potential!to!drastically!change!value!and/or!direction!of!correlation

• If!an!outlier!exists,!report!correlation!with!and!without!it!included!in!the!calculation

Page 6: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Example: Outliers

• Scenario: Scatterplot!shows!final!scores!of!1st round!games!between!1!and!16!seed!in!NCAA!Tournament!the!last!2!years.

• Question: Which!observation!is!an!outlier?

• Answer: ___________________!à 1!seed:!____;!16!seed:!____• First!time!a!_____________________________________________

• Question: What!impact!does!this!point!have!on!the!correlation?

• Answer: Significantly!__________________• With: 0 = ________; Without: 0 = ________

• Takeaway: Outliers!can!______________________!what!appears!to!be!an!otherwise__________________________.

Virginia vs. UMBC

• Least Squares Regression Line: the!best!line!that!can!be!drawn!through!data!on!a!scatterplot• Summarizes!the!general!pattern!to!help!us!understand!how!the!variables!are!related

• Allow!us!to!make!predictions!about!the!response!(Y)!given!some!value!of!the!predictor!(X)

• Almost!never!perfect,!but!gives!us!a!good!idea!of!how!the!response!changes!as!the!predictor!changes

Least!Squares!Regression!Line

Least Squares Regression Line

Example: Preparing for Linear Regression

Necessary!for!finding!equation!of!regression!line

Not!necessary!for!finding!regression!line,!but!help!us!understand!scope!of!data

Page 7: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Least Squares Regression Line

• Least Squares Regression Line:

2( = 34 + 3%$

• Slope: 3% = 056

57

• Estimated!increase!in!prediction!given!one!unit!increase!in!the!predictor

• Intercept: 34 = )( & 3% '$• Predicted!value!of!response!variable!when!predictor!equals!0

Predicted!Value!of!Response:�(-hat�

Intercept Slope Value!of!predictor

Example: Calculating Slope and Intercept

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price

• Question: What!is!the!equation!of!the!regression!line?

• Answer:

• Slope: ______________________________________________

• Intercept: ______________________________________________

• Regression Line: _____________________________

Variable Mean Std. Dev. Correlation

Attractions (X) 30 10.23 .588

Admission Price (Y) 64 21.53

Example: Interpreting Slope and Intercept

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Equation!of!the!regression!line!is:

2( = 89:;< + 1:8>$

• Question: What!do!the!intercept!and!slope!tell!us?

• Answer:• Intercept: A!park!that!has!_________________!would!have!a!___________________!________________________________

• Slope: For!every!additional!________________________________________,!the!________________________________________________________________.

Warning: Make sure you include the word “predicted” or

“expected”. The regression line only approximates values.

Page 8: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Example: Making Predictions

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Equation!of!the!regression!line!is:

2( = 89:;< + 1:8>$

• Question: What!would!we!expect!to!pay!to!enter!an!amusement!park!with!25!attractions?

• Answer: _______________________________

__________________

Residuals

• Residual: the!difference!between!the!predicted!value!of!a!response!and!the!observed!value!from!the!data

? = ( & 2(

• Observations!that!are:• Close to!the!regression!line!will!have!small residuals

• Far away from!the!regression!line!will!have!large residuals

• Above the!regression!line!will!have!positive residuals

• Below the!regression!line!will!have!negative residuals

Residual!(Error) Observed!Value!

From!Data

Predicted!Value!From!

Regression!Line

Example: Residuals

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!The!amusement!park!that!had!25!attractions!had!an!admission!price!of!$66.

• Question: What!is!the!residual?

• Answer: ___________________________________

• Question: What!does!the!residual!mean?

• Answer: Admission!price!is!_______________________!than!what!would!be!______________________________________________.• Observation!lies!___________________________________________

Page 9: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Assumptions in Linear Regression

• Quantitative Data Condition: Regression!only!applies!to!quantitative!data

• Linearity Assumption: The!relationship!between!the!predictor!and!response!is!fairly!linear.

• Outlier Condition: No!individual!point!drastically!changes!the!slope!of!the!regression!line.

• Independence Assumption: Observations!in!the!data!are!selected!independently!of!one!another.

• Equal Spread Condition: The!spread!of!the!observations!around!the!regression!line!is!consistent!across!all!values!of!the!predictor.

Equal Spread Condition

• Residual Plot: plot!of!predictor!values!against!the!corresponding!residual!for!each!observation• Want: Plot!with!no!discernable!patterns!� no!direction!and!no!shape

Equal!Spread!Condition!Satisfied Equal!Spread!Condition!Not!Satisfied

Magnitude of residual

depends on value of predictor

Sign of residual depends

on value of predictor

No patterns. Range of

residuals same across all

predictor values

Example: Residual Plot

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Residual!plot!shown!below.

• Question: What!does!the!residual!plot!reveal?

• Answer:We!would!expect!the!actual!admission!price!of!an!amusement!park!to!be!_________________________________________________!_____________________________________

• Question: Does!the!equal!spread!condition!appear!to!hold?

• Answer: ________• Residuals!appear!to!_________________!________________________________________!________________________________________

Page 10: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Standard Deviation of Residuals

• Standard Deviation of Residuals: measure!of!the!size!of!a!typical!residual

!@ =A?B

- & 8

• Just!like!the!�regular�!standard!deviation,!residuals!that!are:• Within!one!standard!deviation!of!the!regression!line!are!quite!common

• More!than!three!standard!deviations!from!the!regression!line!are!likely!outliers

Note: The standard deviation of the residuals is rarely (if ever)

calculated by hand. We’ll see how to use Excel later to get this value.

Example: Standard Deviation of Residual

• Scenario: The!admission!price!for!an!amusement!park!with!25!rides!was!$66.!!The!predicted!value!was!$57.80!and!the!standard!deviation!of!the!residual!is!$17.76.

• Question: How!unusual!was!our!observation?

• Answer: __________________________• Actual Residual: ____________

• Standard Deviation of Residual: ____________• Actual!residual!is!_________________!� observed!value!within!___________________________!_____________________________________

Variation in the Linear Model

• Two!extremes:• Perfect Linear Model: regression!line!perfectly!predicts!response!all!the!time;!all!residuals!would!be!zero

• “Useless” Linear Model: predictor!gives!no!information!about!response

•Most!linear!models!fall!somewhere!in!between• Predictor!yields!some!information!about!response,!but!predictions!are!not!perfect!and!result!in!some!residuals

Perfect Useless Typical

Page 11: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Variation in the Linear Model

• R-Squared: the!fraction!of!the!variability!in!the!response!(C)!that!is!explained!by!the!predictor!(D)• Calculated!by!squaring!the!correlation:!0B

• No!relationship!between!variables!if!0B = <

• Observations!in!straight!line!if!0B = 1

• Generally!reported!as!a!percentage

Note: Remainder of variation E1 & 0BF is due to natural fluctuations

of the response in the sample and comes from the residuals.

Example: R-Squared

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Correlation!is!.588.

• Question: What!is!the!R-squared!value?

• Answer: _____________________________________

• Question: What!does!the!R-squared!mean!in!context?

• Answer: _________!of!the!variability!_____________________!is!explained!by!_________________________________________________________• The!remaining!____________!is!due!to!___________________________________________!caused!by!other!factors!such!as:• ______________________

• ______________________

• ______________________

Using Excel

• To!do!regression!in!Excel,!you!must!install!the!Data!Analysis!ToolPak,!which!is!an!add-in!available!on:• Excel!2007,!2010,!2013,!and!2016!for!PC

• Excel!2016!for!Mac

• To!do!regression!in!Excel:• Choose!�Data!Analysis�!from!�Data�!tab!to!get!the!menu!below

Links!to!videos!are!posted!on!CourseWeb to!help!you!install

Page 12: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Using Excel

Note: Focus on the parts of the output boxed in red. We’ll worry about the rest

once we get to inference later in the semester.

Appropriateness of Regression

• To!determine!if!regression!is!appropriate:• Ensure!variables!are!quantitative and!observations!are!independent

• Check!the!scatterplot!for:

• Linearity

• Outliers

• Calculate!the!equation!of!the!regression!line

• Use!the!regression!line!to!calculate!the!residuals

• Create!the!residual!plot!and!check!equal spread condition

• Report!final!results

Example: Appropriateness of Regression

• Scenario: Comparing!area!of!states!according!to!region!of!the!country!where!1!=!Northeast,!2!=!South,!3!=!Midwest,!4!=!West.!!Equation!of!the!regression!line!is! 2( = &G>H8G9 + GIHJK8$.

• Question: Is!linear!regression!appropriate!to!use?

• Answer: ________• Linear!regression!is!_____________________________________________________________________________• Could!have!assigned!_______________________________

• Slope!has!______________!because!region!___________________

• Should!use!______________________________________

Page 13: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Example: Appropriateness of Regression

• Scenario: Comparing!a!person�s!self-described!happiness!on!a!scale!from!0-15!with!the!number!of!close!friends!they!have

• Question: Is!linear!regression!appropriate!to!use?

• Answer: ________• Residual!plot!_____________________________________

• Happiness!______________________________________________________________________

• Happiness!______________________________________________________

Lurking Variables

• Lurking Variable: a!variable!that!could!potentially!confound!or!confuse!a!relationship!between!two!other!observed!variables• May!seem!as!if!the!predictor!(X)!causes!the!response!(Y)!to!occur

• In!reality,!the!relationship!between!X!and!Y!is!a!result!of!the!lurking!variable!(L)!being!related!to!both

X Y

L

Observed!Relationship

Example: Lurking Variables

• Scenario: Scatterplot!comparing!number!of!firefighters!at!a!fire!with!the!amount!of!damage!caused!(in!dollars).

• Question: Does!having!more!firefighters!cause!more!damage?

• Answer: ____________________

• Question: What!is!the!lurking!variable?

• Answer: ____________________________• _____________________!require!more!help

Firefighters Damage

_________________