Symboisis Statistics
-
Upload
pradeep-joshi -
Category
Documents
-
view
223 -
download
0
Transcript of Symboisis Statistics
-
7/28/2019 Symboisis Statistics
1/102
-
7/28/2019 Symboisis Statistics
2/102
-
7/28/2019 Symboisis Statistics
3/102
-
7/28/2019 Symboisis Statistics
4/102
Median and Mean of a Density Curve
The median of a density curve is the equal-areas point, the point that dividesthe area under the curve in half.The mean of a density curve is the balance point, at which the curve wouldbalance ifmade of solid material.The median and mean are the same for a symmetric density curve. They both lie atthe center of the curve. The mean of a skewed curve is pulled away from the
median in the direction of the long tail.
-
7/28/2019 Symboisis Statistics
5/102
Statistics
Founded in 1890, the Literary Digest magazine was famous for its success in conducting pollsto predict winners in presidential elections. The magazine correctly predicted the winners in thepresidential elections of 1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest
between Alf Landon and Franklin D. Roosevelt, the magazine sent out 10 million ballots andreceived 1,293,669 ballots for Landon and 972,897 ballots for Roosevelt, so it appeared thatLandon would capture 57% of the vote.
Well, Landon received 16,679,583 votes to the 27,751,597 votes cast for Roosevelt. Insteadof getting 57% of the vote as suggested by the Literary Digest poll, Landon received only37% of the voteIn that same 1936 presidential election, George Gallup used a much smaller
poll of 50,000 subjects, and he correctly predicted that Roosevelt would win.
-
7/28/2019 Symboisis Statistics
6/102
-
7/28/2019 Symboisis Statistics
7/102
Flipping of coin
-
7/28/2019 Symboisis Statistics
8/102
Data A plural noun (the singular form is datum) which means a set of known or given things,facts. Note that data can be numerical (e.g. age of people) or non-numerical (e.g. gender ofpeople).
statistics Without a capital letter, i.e. in its lower-case form, this means a set of numerical
data or figures that have been collected systematically.
Statistics With a capital letter this is a proper noun that means the set of methods andtheories that can be used to arrange, analyse and interpret statistics.
A variable A quantity that varies, the opposite of a constant. For example, the number of
mobile phones sold per day in a shop is a variable, whereas the number of hours in a day is aconstant. In the expressions that we will use to summarize methods a capital letter, usuallyXor Y, will be used to represent a variable.
Value A specific amount that it is possible for a variable to be. For example, the number ofmobile phones sold per day could be 25 or 43 or 51. These are all possible values of thevariable number of phones sold.
-
7/28/2019 Symboisis Statistics
9/102
Random This adjective refers to something that occurs in an unplanned way. A randomvariable is a variable whose observedvalues arise by chance. The number of new accounts abank opens during a month is a variable that is random, whereas the number of days in amonth is a variable that is not random, i.e. its observed values are pre-determined.
Distribution The pattern exhibited by the observed values of a variable when they arearranged in order of magnitude. A theoretical distribution is one that has been deduced, ratherthan compiled from observed values.
Population Generally this means the total number of persons residing in a defined area at agiven time. In Statistics apopulation is the complete set of things we want to investigate.These may be human such as all the people who have visited a supermarket, or inanimate
such as all the policies issued by an insurance company.
Sample A subset of the population, that is, a smaller numberof items picked from thepopulation. A random sample is a sample whose components have been chosen in a randomway, that is, on the basis that any single item in the population has no more or less chancethan any other to be included in the sample.
-
7/28/2019 Symboisis Statistics
10/102
-
7/28/2019 Symboisis Statistics
11/102
-
7/28/2019 Symboisis Statistics
12/102
Copyright 2004Pearson Education, Inc.
BusinessThe etymology of "business" relates to the state of being busy either as an
individual or society as a whole, doing commercially viable and profitablework
A business (also known as enterprise or firm) isan organization engaged in the trade ofgoods, services, or bothto consumers.[
business statistics can be described as the collection, summarization,
analysis, and reporting of numerical findings relevant to a business
decision or situation.
http://en.wikipedia.org/wiki/Etymologyhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Tradehttp://en.wikipedia.org/wiki/Good_(economics)http://en.wikipedia.org/wiki/Service_(economics)http://en.wikipedia.org/wiki/Consumerhttp://en.wikipedia.org/wiki/Businesshttp://en.wikipedia.org/wiki/Businesshttp://en.wikipedia.org/wiki/Consumerhttp://en.wikipedia.org/wiki/Service_(economics)http://en.wikipedia.org/wiki/Good_(economics)http://en.wikipedia.org/wiki/Tradehttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Etymology -
7/28/2019 Symboisis Statistics
13/102
Copyright 2004Pearson Education, Inc.
Why Statistics
The time has three phases Past ,Present and Future
To continue and growth of any business depends on strategic decisions basedon finance, operations or market
The decision making is very crucial either it is based on intuition or information/Knowledge
The Data ( Facts of present) Analysis Information Knowledge
Knowledge base decisions are based on some model
There is a time lag between awareness of impeding event or need andoccurrence of that event
This is lead time and hence planning and forecasting is needed
Occurrence is either random or has a causal relation.
The statistics helps here
-
7/28/2019 Symboisis Statistics
14/102
Properties of Estimators
Statistics:
1. Sufficiency2. Un-biased3. Resistance4. Efficiency
Parameters: Describe the population
Describe
But we use it to estimate population parameters
samples.
-
7/28/2019 Symboisis Statistics
15/102
-
7/28/2019 Symboisis Statistics
16/102
Samples of Two from the above population
If
Sample y: 1, 2
If
25.0
2
2
n
yys
50.01
22
n
yys
-
7/28/2019 Symboisis Statistics
17/102
-
7/28/2019 Symboisis Statistics
18/102
-
7/28/2019 Symboisis Statistics
19/102
-
7/28/2019 Symboisis Statistics
20/102
(1) Carefully defining the situation, (2) gathering data, (3) accurately summarizing thedata, and (4) deriving and communicating meaningful conclusions.
Statistics: The science of collecting, describing, and interpreting data.
Population: A collection, or set, of individuals, objects, or events whose
properties are to be analyzed.
Sample: A subset of a population.
Variable (or response variable): A characteristic of interest about each
individual element of a population or sample.Data value: The value of the variable associated with one element of a
population or sample. This value may be a number, a word, or a symbol
Data: The set of values collected from the variable from each of the elements
that belong to the sample.
Experiment: A planned activity whose results yield a set of data.
Parameter: A numerical value summarizing all the data of an entire pulation.
Statistic: A numerical value summarizing the sample data.
Qualitative, or attribute, or categorical, variable: A variable that describes orcategorizes an element of a population.
-
7/28/2019 Symboisis Statistics
21/102
A variable is simply something that can vary: that is, it can take on many different
values or categories. Examples of variables are gender, typing speed, top speed ofa car, number of reported symptoms of an illness, temperature, attendances at rockfestivals (e.g. the Download festival), level of anxiety, number of goals scored infootball matches, intelligence, number of social encounters while walking your dog,amount of violence on television, occupation and favourite colours. These are allthings that we can measure and record and that vary We are generally interested
in variables because we want to understand why they vary as they do.
-
7/28/2019 Symboisis Statistics
22/102
Ordinal variable: A qualitative variable that incorporates an ordered
-
7/28/2019 Symboisis Statistics
23/102
Ordinal variable: A qualitative variable that incorporates an ordered
position, orranking.
Discrete variable: A quantitative variable that can assume a countable
number ofvalues. Intuitively, the discrete variable can assume any valuescorresponding to isolated points along a line interval. That is, there is a gap
between any two values.Continuous variable: A quantitative variable that can assume an
uncountable number of values. Intuitively, the continuous variable can assumeany value along a line interval, including every possible value between any twovalues.
Biased sampling method: A sampling method that produces data thatsystematically differ from the sampled population. An unbiased sampling methodis one that is not biased
Sampling frame: A list, or set, of the elements belonging to the population
from which the sample will be drawn.
-
7/28/2019 Symboisis Statistics
24/102
Data is a numerical information
Data
Information
Analysis
Knowledge
Only data is useless it has to be organized summarized and presented
based on it is analyzed or estimated these are the functions of statistics
Measurement is done is either quantitative or qualitative
Scales used
Nominal Scale
Ordinal Scale.
Interval Scale.
Ratio Scale
-
7/28/2019 Symboisis Statistics
25/102
event is more likely to occur Probabilities closer to 0 indicate that the event
-
7/28/2019 Symboisis Statistics
26/102
event is more likely to occur. Probabilities closer to 0 indicate that the eventis less likely to occur.P(A), read P of A, denotes the probability of event A.
IfP(A) 1, the event A is certain to occur.IfP(A) 0, the event A is certain not to occur.
Probability is base for inferential statisticsEvent is outcome of an experiment
Sample space collection of all events
1. All sample point probabilities lie between 0 and 1
2. Sum of probabilities of all sample point within sample space =1
-
7/28/2019 Symboisis Statistics
27/102
Mutually exclusive events are statistically independent
When two events are mutually exclusive then the probability ofA or B occurringcan be expressed by the following addition rule for mutually exclusiveevents P(A, or B) P(A) P(B)
A queen of sped and Ace of sped has probability
P(As or Qs)1/52+1/52 with replacement and 1/52+1/51 without replacement
If two events are non-mutually exclusive
addition rule for no mutually exclusive events P(A, or B)= P(A) + P(B)- P(AB)
joint probability. This is calculated by the product of the individual marginal
probabilities P(AB) = P(A) * P(B)The concept ofstatistical dependence implies that the probability of acertain event is dependent on the occurrence of another event
-
7/28/2019 Symboisis Statistics
28/102
or successes o e o a num er o ou comes. xpresse as a ormu a,
-
7/28/2019 Symboisis Statistics
29/102
p ,
The classic theory assumes that all outcomes have equal likelihood ofoccurring. In the example just cited, each card must have an equal chanceof being chosenno card is larger than any other or in any way more likelyto be chosen than any other card. The classic theory pertains only to outcomes thatare mutually exclusive (ordisjoint), which means that those outcomes maynot occur at the same time. For example, one coin flip can result in a head or a
tail, but one coin flip cannot result in a head and a tail. So the outcome of a headand the outcome of a tail are said to be mutually exclusive in one coin flip, as isthe outcome of an ace and a king as the outcome of one card being drawn.
A probability assignment based on equally likely outcomes uses the formula
-
7/28/2019 Symboisis Statistics
30/102
11.30
Chapter 11
Introduction to Hypothesis
Testing
-
7/28/2019 Symboisis Statistics
31/102
11.31
Nonstatistical Hypothesis Testing
A criminal trial is an example of hypothesistesting without the statistics.
In a trial a jury must decide between twohypotheses. The null hypothesis is
H0: The defendant is innocent
The alternative hypothesis or researchhypothesis is
H1: The defendant is guilty
-
7/28/2019 Symboisis Statistics
32/102
-
7/28/2019 Symboisis Statistics
33/102
11.33
Nonstatistical Hypothesis Testing
There are two possible errors.
A Type I error occurs when we reject atrue null hypothesis. That is, a Type I erroroccurs when the jury convicts an innocentperson. We would want the probability ofthis type of error [maybe 0.001 beyond a
reasonable doubt] to be very small for acriminal trial where a conviction results inthe death penalty, whereas for a civil trial,
where conviction might result in someone
-
7/28/2019 Symboisis Statistics
34/102
11.34
Nonstatistical Hypothesis Testing
A Type II error occurs when we dont
reject a false null hypothesis [accept thenull hypothesis]. That occurs when a guilty
defendant is acquitted. In practice, this type of error is by far the
most serious mistake we normally make.
For example, if we test the hypothesis thatthe amount of medication in a heart pill isequal to a value which will cure your heart
problem and accept the hull hypothesis
-
7/28/2019 Symboisis Statistics
35/102
11.35
Nonstatistical Hypothesis Testing
The probability of a Type I error is denotedas (Greek letteralpha). The probabilityof a type II error is (Greek letterbeta).
The two probabilities are inversely related.Decreasing one increases the other, for a
fixed sample size.
In other words, you cant have and both real small for an old sam le size.
-
7/28/2019 Symboisis Statistics
36/102
11.36
Types of Errors
A Type I error occurs when we rejectat ruenull hypothesis (i.e. Reject H0 when itis TRUE)
H0 T F
Reject I
Reject II
-
7/28/2019 Symboisis Statistics
37/102
11.37
Nonstatistical Hypothesis Testing
The critical concepts are theses:
1. There are two hypotheses, the null and thealternative hypotheses.
2. The procedure begins with the assumption that thenull hypothesis is true.
3. The goal is to determine whether there is enoughevidence to infer that the alternative hypothesis is true,orthe null is not likely to be true.
4. There are two possible decisions:
Conclude that there is enough evidence to supportthe alternative hypothesis. Reject the null.
Conclude that there is notenough evidence tosupport the alternative hypothesis. Fail to reject the
-
7/28/2019 Symboisis Statistics
38/102
11.38
Concepts of Hypothesis Testing(1)
The two hypotheses are called the nul lhypothes isand the other the alternativeorresearch hypothesis. The usual
notation is:
H0: the null hypothesis
H1: the alternative or research
pronouncedH nought
C f
-
7/28/2019 Symboisis Statistics
39/102
11.39
Concepts of HypothesisTesting
Consider mean demand for computersduring assembly lead time. Rather thanestimate the mean demand, our
operations manager wants to knowwhether the mean is d i f ferent from 350uni ts. In other words, someone is claimingthat the mean time is 350 units and we
want to check this claim out to see if itappears reasonable. We can rephrase thisrequest into a test of the hypothesis:
H0: = 350
C f H h i
-
7/28/2019 Symboisis Statistics
40/102
11.40
Concepts of HypothesisTesting
For example, if were trying to decide
whether the mean is not equal to 350, alarge value of (say, 600) would provide
enough evidence.
If is close to 350 (say, 355) we could not
say that this provides a great deal ofevidence to infer that the population meanis different than 350.
C t f H th i T ti
-
7/28/2019 Symboisis Statistics
41/102
11.41
Concepts of Hypothesis Testing(4)
The two possible decisions that can be made:
Conclude that there isenough evidenceto support thealternative hypothesis
(also stated as: reject the null hypothesis in favor of thealternative)
Conclude that there i s notenough evidenceto supportthe alternative hypothesis
(also stated as: failing to reject the null hypothesis in favorof the alternative)
NOTE: we do not say that we accept the null hypothesis ifa statistician is around
C t f H th i T ti
-
7/28/2019 Symboisis Statistics
42/102
11.42
Concepts of Hypothesis Testing(2)
The testing procedure begins with theassumpt ion that the nul l hypo thesis is
true.
Thus, until we have further statisticalevidence, we will assume:
H0: = 350 (assumed to be TRUE)
The next step will be to determine the
-
7/28/2019 Symboisis Statistics
43/102
11.43
Is the Sample Mean in the Guts of the SamplingDistribution??
-
7/28/2019 Symboisis Statistics
44/102
11.44
Three ways to determine this: First way
1. Unstandardized test statistic: Is inthe guts of the sampling distribution?Depends on what you define as the guts
of the sampling distribution.
If we define the guts as the center 95% of
the distribution [this means = 0.05],then the critical values that define theguts will be 1.96 standard deviations of X-
Bar on either side of the mean of the
-
7/28/2019 Symboisis Statistics
45/102
11.45
1. Unstandardized Test Statistic Approach
-
7/28/2019 Symboisis Statistics
46/102
11.46
Three ways to determine this: Second way
2. Standardized test statistic: Since wedefined the guts of the sampling
distribution to be the center 95% [ =
0.05], If the Z-Score for the sample mean is
greater than 1.96, we know that will be
in the reject region on the right side or If the Z-Score for the sample mean is
less than -1.97, we know that will be in
the reject region on the left side.
-
7/28/2019 Symboisis Statistics
47/102
11.47
2. Standardized Test Statistic Approach
-
7/28/2019 Symboisis Statistics
48/102
11.48
Three ways to determine this: Third way
3. The p-valueapproach (which is generally used with acomputer and statistical software): Increase theRejection Region until it captures the sample mean.
For this example, since is to the right of the mean,calculate
P( > 370.16) = P(Z > 1.344) = 0.0901
Since this is a two tailed test, you must double this areafor the p-value.
p-value = 2*(0.0901) = 0.1802
Since we defined the guts as the center 95% [ = 0.05],the reject region is the other 5%. Since our samplemean, , is in the 18.02% region, it cannot be in our 5%
-
7/28/2019 Symboisis Statistics
49/102
11.49
3. p-value approach
-
7/28/2019 Symboisis Statistics
50/102
11.50
Statistical Conclusions:
Unstandardized Test Statistic:
Since LCV (320.6) < (370.16) 170 (this is what we want todetermine)
-
7/28/2019 Symboisis Statistics
53/102
11.53
Example 11.1
What we want to show:
H1: > 170
H0: < 170 (well assume this is true)
Normally we put Ho first.
We know:
n = 400, = 178, and
= 65
= 65/SQRT(400) = 3.25
Example 11 1 Rejection
-
7/28/2019 Symboisis Statistics
54/102
11.54
Example 11.1 Rejection
Region The reject ion regionis a range of values
such that if the test statistic falls into thatrange, we decide to reject the null
hypothesis in favor of the alternativehypothesis.
is the critical value of to reject H0.
-
7/28/2019 Symboisis Statistics
55/102
11.55
Example 11.1
At a 5% significance level (i.e. =0.05), we get [all inone tail]
Z
= Z0.05 = 1.645
Therefore, UCV = 170 + 1.645*3.25 =175.35
Since our sample mean (178) is greater thanthe criticalvalue we calculated (175.35), we reject the null
hypothesis in favor of H1 OR
(>1.645)Reject null
Example 11 1 The Big
-
7/28/2019 Symboisis Statistics
56/102
11.56
Example 11.1 The Big
Picture
=175.34
=178
H1: > 170H0: = 170
Reject H0 in favor of
-
7/28/2019 Symboisis Statistics
57/102
11.57
Interpreting the p-value
The smaller the p-value, the morestatistical evidence exists to support thealternative hypothesis.
If the p-value is less than 1%, there isoverwhelm ing evidencethat supportsthe alternative hypothesis.
If the p-value is between 1% and 5%,there is a strong evidencethat supportsthe alternative hypothesis.
If the p-value is between5% and 10%
there is a weak evidencethat supports
-
7/28/2019 Symboisis Statistics
58/102
11.58
Interpreting the p-valueOverwhelming Evidence(Highly Significant)
Strong Evidence(Significant)
Weak Evidence
(Not Significant)
No Evidence(Not Significant)
0 .01 .05 .10
p=.0069
Conclusions of a Test of
-
7/28/2019 Symboisis Statistics
59/102
11.59
Conclusions of a Test ofHypothesis
If we reject the null hypothesis, weconclude that there is enough evidence toinfer that the alternative hypothesis is true.
If we fail to reject the null hypothesis, weconclude that there is not enough
statistical evidence to infer that thealternative hypothesis is true. This doesnot mean that we have proven that the null
hypothesis is true!
One tail test with rejection
-
7/28/2019 Symboisis Statistics
60/102
11.60
One tail test with rejectionregion on right
The last example was a one tai l test,because the rejection region is located inonly one tail of the sampling distribution:
More correctly, this was an example of a
One tail test with rejection
-
7/28/2019 Symboisis Statistics
61/102
11.61
One tail test with rejectionregion on left
The rejection region will be in the left tail.
T t il t t ith j ti i i b th
-
7/28/2019 Symboisis Statistics
62/102
11.62
Two tail test with rejection region in bothtails
The rejection region is split equallybetween the two tails.
-
7/28/2019 Symboisis Statistics
63/102
11.63
Example 11.2 Students work
AT&Ts argues that its rates are such that
customers wont see a difference in their
phone bills between them and their
competitors. They calculate the mean andstandard deviation for all their customersat $17.09 and $3.87 (respectively). Note:
Dont know the true value for , so weestimate from the data [ ~ s = 3.87]large sample so dont worry.
They then sample 100 customers at
-
7/28/2019 Symboisis Statistics
64/102
11.64
Example 11.2
The rejection region is set up so we canreject the null hypothesis when the teststatistic is large orwhen it is small.
stat is small stat is large
-
7/28/2019 Symboisis Statistics
65/102
11.65
Example 11.2
At a 5% significance level (i.e. = .05),we have
/2 = .025. Thus, z.025 = 1.96 and our
rejection region is:
z 1.96
z-z.025 +z.0250
-
7/28/2019 Symboisis Statistics
66/102
11.66
Example 11.2
From the data, we calculate = 17.55
Using our standardized test statistic:
We find that:
Since z = 1.19 is not greater than 1.96, nor
less than1.96 we cannot reject the null
Summary of One- and Two-Tail
http://e/TT%20PowerPoint%20slides/References/Xm11-02.xlshttp://e/TT%20PowerPoint%20slides/References/Xm11-02.xls -
7/28/2019 Symboisis Statistics
67/102
11.67
Summary of One- and Two-TailTests
One-Tail Test
(left tail)
Two-Tail Test One-Tail Test
(right tail)
-
7/28/2019 Symboisis Statistics
68/102
11.68
Probability of a Type II Error
A Type II error occurs when a false nullhypothesis is not rejected or you acceptthe null when it is not true but dont say it
this way if a statistician is around.
In practice, this is by far the most serious
error you can make in most cases,especially in the quality field.
-
7/28/2019 Symboisis Statistics
69/102
11.69
Judging the Test
A statistical test of hypothesis is effectivelydefined by the significance level ( ) andthe sample size (n), bo th of wh ich are
selectedby the statistics practitioner.
Therefore, if the probability of a Type II
error ( ) is too large [we have insufficientpower], we can reduce it by
increasing , and/or
increasin the sam le size n.
-
7/28/2019 Symboisis Statistics
70/102
11.70
Judging the Test
The power of a testis defined as 1 . It represents the probability of rejecting the null
hypothesis when it is false and the true mean issomething other than the null value for the mean.
If we are testing the hypothesis that the average amountof medication in blood pressure pills is equal to 6 mg(which is good), and we fail to reject the nullhypothesis, ship the pills to patients worldwide, only to
find out later that the true average amount ofmedication is really 8 mg and people die, we get introuble. This occurred because the P(reject the null / truemean = 7 mg) = 0.32 which would mean that we have a68% chance on not rejecting the null for these BAD pills
and shipping to patients worldwide.
-
7/28/2019 Symboisis Statistics
71/102
11.71
Probability you ship pills whose mean amount of medication is 7 mg approximately 67%
DefinitionWhen we select a sample from a population and then try to estimate the
-
7/28/2019 Symboisis Statistics
72/102
population parameter from the sample, we will not be entirely accurate. Thedifference between the population parameter and the sample statistic is thesampling error.
-
7/28/2019 Symboisis Statistics
73/102
-
7/28/2019 Symboisis Statistics
74/102
-
7/28/2019 Symboisis Statistics
75/102
Data collection
-
7/28/2019 Symboisis Statistics
76/102
Statistics is the study of how to collect, organize, analyze, and interpret
numerical information from data.The goal of stat is t icsis to gain understanding from data
Individuals are the people or objects included in the study.A variable isa characteristic of the individual to be measured or observed
A quantitative variable has a value or numerical measurement for whichoperations such as addition or averaging make sense.A qualitative variable describes an individual by placing the individual into a
category or group,such as male or female it is categorical variablesIn population data, the data are from every ind iv idu al of interest.In sample data, the data are from only some of the indiv iduals ofinterest.
A parameter is a numerical measure that describes an aspect of a
population.A statistic is a numerical measure that describes an aspect of a sample.
DATA
-
7/28/2019 Symboisis Statistics
77/102
Summarizing the data: Summarization is a process in which the data isreduced for interpretation without sacrificing any important information.Finding hidden relation ship, Anomalies ,trends, estimating ,predicting
Data analysis task
Data
-
7/28/2019 Symboisis Statistics
78/102
Element
Dictums'
Discrete
Variable
Continuous Discrete
QualitativeQuantitativeInterval
Ratio scaleFor measurement
NominalOrdinal scale
For measurement
investigatoris interested. The population is also called the universe.
-
7/28/2019 Symboisis Statistics
79/102
A sample is a subset of measurements selected from the population.Sampling from the population is often done randomly, such that everypossible sample ofn elements will have an equal chance of beingselected. A sample selected in this way is called a simple random sample,or just a random sample. A random sample allows chance to determineits elements.
A survey by an electric company contains questions on the following:1. Age of household head.2. Sex of household head.
3. Number of people in household.4. Use of electric heating (yes or no).5. Number of large appliances used daily.6. Thermostat setting in winter.7. Average number of hours heating is on.8. Average number of heating days.
9. Household income.10. Average monthly electric bill.11. Ranking of this electric company as compared with two previous electricitysuppliers.Describe the variables implicit in these 11 items as quantitative or qualitative, anddescribe the scales of measurement
Given a set of numerical observations, we may order them according to magnitude.Once we have done this, it is possible to define the boundaries of the set. Anyt d t
-
7/28/2019 Symboisis Statistics
80/102
studentwho has taken a nationally administered test, such as the Scholastic Aptitude Test(SAT), is familiar withpercentiles. Your score on such a test is compared with thescores
of all people who took the test at the same time, and your position within this groupisdefined in terms of a percentile. If you are in the 90th percentile, 90% of the peoplewho took the test received a score lower than yours. We define a percentile asfollows.The Pth percent i le of a group o f num bers is that value below which l ie P%(P percent) of the numbers in the group. The position of the Pth percentileis given by (n 1)P/100, where n is the number of data points.
The magazine Forbes publishes annually a list of the worlds wealthiest individuals.For 2007, the net worth of the 20 richest individuals, in billions of dollars, in noparticular
order, is as follows:33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18Find the 50th and 80th percentiles of this set of the worlds top 20 net worths.
-
7/28/2019 Symboisis Statistics
81/102
-
7/28/2019 Symboisis Statistics
82/102
-
7/28/2019 Symboisis Statistics
83/102
Basic concept of Probability
It is better to be roughly right than precisely wrong
-
7/28/2019 Symboisis Statistics
84/102
It is better to be roughly right than precisely wrong.
John Maynard Keynes
You all have probably heard the story about Malcolm Forbes, who once got lostfloating for miles in one of his famous balloons and finally landed in the middle of a
cornfield. He spotted a man coming toward him and asked, Sir, can you tell mewhere I am? The man said, Certainly, you are in a basket in a field of corn.Forbes said, You must be a statistician. The man said, Thats amazing, how did
youknow that? Easy, said Forbes, your information is concise, precise, and
absolutelyuseless!
-
7/28/2019 Symboisis Statistics
85/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
86/102
Nominal scale A scale of measurement for a variable that uses a label or name
to identify
an attribute of an element. Nominal data may be nonnumeric or numeric.
Ordinal scale A scale of measurement for a variable that has the properties ofnominal
data and can be used to rank or order the data. Ordinal data may be nonnumeric ornumeric.Interval scale A scale of measurement for a variable that has the properties of
ordinal
data and the interval between observations is expressed in terms of a fixed unit ofmeasure.Interval data are always numeric.Ratio scale A scale of measurement for a variable that has all the properties of
interval
data and the ratio of two values is meaningful. Ratio data are always numeric.
Measure of variation
-
7/28/2019 Symboisis Statistics
87/102
Data
Qualitative or
attribute
Discret
eContinuou
s
Type of car owned.
Color of pens.
Number of children. Time taken for
an exam.
Presentation
The pineapples are the objects (individuals) of the
-
7/28/2019 Symboisis Statistics
88/102
The pineapples are the objects (individuals) of thestudy. If the researchers are
interested in the individual weights of pineapples in thefield, then the variable
consists of weights. At this point, it is important tospecify units ofmeasurement and degree of accuracy ofmeasurement. The weights could be
measured to the nearest ounce or gram. Weight is aquantitative variable
because it is a numerical measure. If weights ofall theready-to-harvest pineapples
in the field are included in the data, then we have apopulation. The average
weight of all ready-to-harvest pineapples in the field is aparameter.
(b) Suppose the researchers also want data on taste. A panel of tasters rates thepineapples according to the categories poor, acceptable, and good. Onlysome of the pineapples are included in the taste test. In this case, the variable istaste. This is a qualitative or categorical variable. Because only some of thepineapples in the field are included in the study, we have a sample. The proportionof pineapples in the sample with a taste rating of good is a statistic.
-
7/28/2019 Symboisis Statistics
89/102
OrderedArray
OgivePolygonHisto-
gram
FrequencyDistributions
NumericalData
Stem-&-Leaf
Display
Numerical (Quantitative)
Data Presentation
-
7/28/2019 Symboisis Statistics
90/102
Numerical (Quantitative)
Data Presentation
summar z nginformation from samples or populations.Inferential statistics involves methods of using information from a sample to
-
7/28/2019 Symboisis Statistics
91/102
draw conclusions regarding the population.
A simple random sample ofn measurements from a populat ion is a subsetof the population selected in a manner such that every sample of size n fromthe population has an equal chance of being selected.
Probability
-
7/28/2019 Symboisis Statistics
92/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
93/102
Central tendencyMean is summarizing the data in to one fig. summarize a wide range of
measurements with a single value?Mean X number = TotalWhen there is no trend and values are fluctuating arithmetic mean is a bestrepresentative. Distribution is normal and not skewArithmetic mean > Geometric mean > Harmonic mean
Probably the least understood, the harmonic mean is best used in situations whereextreme outliers exist in the population. The harmonic mean can be manuallycalculated; however, most people will find it much easier to just use Excel. In Excel,the harmonic mean can be calculated by using the HARMEAN() function
The arithmetic mean is best used in situations where:
the data are not skewed (no extreme outliers)the individual data points are not dependent on each other (see the section below forexamples of where data are interrelated, e.g., financial analysis)
Geometric means are often useful summaries for highly skewed dataWhenthere is growth or trend observed geometric mean is best
Functions of statisticsSome important functions of statistics are as follows1 To collect and present facts in a systematic manner
-
7/28/2019 Symboisis Statistics
94/102
1. To collect and present facts in a systematic manner.2. Helps in formulation and testing of hypothesis.3. Helps in facilitating the comparison of data.4. Helps in predicting future trends.
5. Helps to find the relationship between variable.6. Simplifies the mass of complex data.7. Help to formulate polices.8. Helps Government to take decisions.Limitations of statistics
1. Does not study qualitative phenomenon.
2. Does not deal with individual items.3. Statistical results are true only on an average.4. Statistical data should be uniform and homogeneous.5. Statistical results depends on the accuracy of data.6. Statistical conclusions are not universally true.7. Statistical results can be interpreted only if person hassound knowledge ofstatistics
Data collection
Central tendency and Dispersion
-
7/28/2019 Symboisis Statistics
95/102
Central tendency and DispersionCentral tendency is middle point of distribution measures of central tendency is alsocalled measure of locationDispersion is spread of data in distribution extent to which data is scatteredThere are two more characteristics skewness and kurtosis
Mean of individual data x/nMean for grouped data (fXx)/n x= midpoint of classArithmetic mean has following advantages1. Simple to understand2. It is one and only one for data set3. Mean is suitable for statistical procedureDisadvantage1.Afected by extreme observation2.It is not representative of whole data
Weighted average meanGeometric meanMedian
Basic concept of ProbabilityRolling of die
-
7/28/2019 Symboisis Statistics
96/102
Taking out card from deck of cards
Probability of 5 or 6 P(5) or P(6) = 1/6+1/6
Probability of sped and queen P(s) or P(q) = 13/52+4/52-1/52P(s) +P(q)P(s AND q)
And or are called as operators
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
97/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
98/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
99/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
100/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
101/102
Basic concept of Probability
-
7/28/2019 Symboisis Statistics
102/102