Symboisis Statistics

7/28/2019 Symboisis Statistics

1/102


2/102


3/102


4/102

Median and Mean of a Density Curve

The median of a density curve is the equal-areas point, the point that dividesthe area under the curve in half.The mean of a density curve is the balance point, at which the curve wouldbalance ifmade of solid material.The median and mean are the same for a symmetric density curve. They both lie atthe center of the curve. The mean of a skewed curve is pulled away from the

median in the direction of the long tail.


5/102

Statistics

Founded in 1890, the Literary Digest magazine was famous for its success in conducting pollsto predict winners in presidential elections. The magazine correctly predicted the winners in thepresidential elections of 1916, 1920, 1924, 1928, and 1932. In the 1936 presidential contest

between Alf Landon and Franklin D. Roosevelt, the magazine sent out 10 million ballots andreceived 1,293,669 ballots for Landon and 972,897 ballots for Roosevelt, so it appeared thatLandon would capture 57% of the vote.

Well, Landon received 16,679,583 votes to the 27,751,597 votes cast for Roosevelt. Insteadof getting 57% of the vote as suggested by the Literary Digest poll, Landon received only37% of the voteIn that same 1936 presidential election, George Gallup used a much smaller

poll of 50,000 subjects, and he correctly predicted that Roosevelt would win.


6/102


7/102

Flipping of coin


8/102

Data A plural noun (the singular form is datum) which means a set of known or given things,facts. Note that data can be numerical (e.g. age of people) or non-numerical (e.g. gender ofpeople).

statistics Without a capital letter, i.e. in its lower-case form, this means a set of numerical

data or figures that have been collected systematically.

Statistics With a capital letter this is a proper noun that means the set of methods andtheories that can be used to arrange, analyse and interpret statistics.

A variable A quantity that varies, the opposite of a constant. For example, the number of

mobile phones sold per day in a shop is a variable, whereas the number of hours in a day is aconstant. In the expressions that we will use to summarize methods a capital letter, usuallyXor Y, will be used to represent a variable.

Value A specific amount that it is possible for a variable to be. For example, the number ofmobile phones sold per day could be 25 or 43 or 51. These are all possible values of thevariable number of phones sold.


9/102

Random This adjective refers to something that occurs in an unplanned way. A randomvariable is a variable whose observedvalues arise by chance. The number of new accounts abank opens during a month is a variable that is random, whereas the number of days in amonth is a variable that is not random, i.e. its observed values are pre-determined.

Distribution The pattern exhibited by the observed values of a variable when they arearranged in order of magnitude. A theoretical distribution is one that has been deduced, ratherthan compiled from observed values.

Population Generally this means the total number of persons residing in a defined area at agiven time. In Statistics apopulation is the complete set of things we want to investigate.These may be human such as all the people who have visited a supermarket, or inanimate

such as all the policies issued by an insurance company.

Sample A subset of the population, that is, a smaller numberof items picked from thepopulation. A random sample is a sample whose components have been chosen in a randomway, that is, on the basis that any single item in the population has no more or less chancethan any other to be included in the sample.


10/102


11/102


12/102

Copyright 2004Pearson Education, Inc.

BusinessThe etymology of "business" relates to the state of being busy either as an

individual or society as a whole, doing commercially viable and profitablework

A business (also known as enterprise or firm) isan organization engaged in the trade ofgoods, services, or bothto consumers.[

business statistics can be described as the collection, summarization,

analysis, and reporting of numerical findings relevant to a business

decision or situation.
http://en.wikipedia.org/wiki/Etymologyhttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Tradehttp://en.wikipedia.org/wiki/Good_(economics)http://en.wikipedia.org/wiki/Service_(economics)http://en.wikipedia.org/wiki/Consumerhttp://en.wikipedia.org/wiki/Businesshttp://en.wikipedia.org/wiki/Businesshttp://en.wikipedia.org/wiki/Consumerhttp://en.wikipedia.org/wiki/Service_(economics)http://en.wikipedia.org/wiki/Good_(economics)http://en.wikipedia.org/wiki/Tradehttp://en.wikipedia.org/wiki/Organizationhttp://en.wikipedia.org/wiki/Etymology


13/102

Copyright 2004Pearson Education, Inc.

Why Statistics

The time has three phases Past ,Present and Future

To continue and growth of any business depends on strategic decisions basedon finance, operations or market

The decision making is very crucial either it is based on intuition or information/Knowledge

The Data ( Facts of present) Analysis Information Knowledge

Knowledge base decisions are based on some model

There is a time lag between awareness of impeding event or need andoccurrence of that event

This is lead time and hence planning and forecasting is needed

Occurrence is either random or has a causal relation.

The statistics helps here


14/102

Properties of Estimators

Statistics:

1. Sufficiency2. Un-biased3. Resistance4. Efficiency

Parameters: Describe the population

Describe

But we use it to estimate population parameters

samples.


15/102


16/102

Samples of Two from the above population

If

Sample y: 1, 2

If

25.0

2

2

n

yys

50.01

22

n

yys


17/102


18/102


19/102


20/102

(1) Carefully defining the situation, (2) gathering data, (3) accurately summarizing thedata, and (4) deriving and communicating meaningful conclusions.

Statistics: The science of collecting, describing, and interpreting data.

Population: A collection, or set, of individuals, objects, or events whose

properties are to be analyzed.

Sample: A subset of a population.

Variable (or response variable): A characteristic of interest about each

individual element of a population or sample.Data value: The value of the variable associated with one element of a

population or sample. This value may be a number, a word, or a symbol

Data: The set of values collected from the variable from each of the elements

that belong to the sample.

Experiment: A planned activity whose results yield a set of data.

Parameter: A numerical value summarizing all the data of an entire pulation.

Statistic: A numerical value summarizing the sample data.

Qualitative, or attribute, or categorical, variable: A variable that describes orcategorizes an element of a population.


21/102

A variable is simply something that can vary: that is, it can take on many different

values or categories. Examples of variables are gender, typing speed, top speed ofa car, number of reported symptoms of an illness, temperature, attendances at rockfestivals (e.g. the Download festival), level of anxiety, number of goals scored infootball matches, intelligence, number of social encounters while walking your dog,amount of violence on television, occupation and favourite colours. These are allthings that we can measure and record and that vary We are generally interested

in variables because we want to understand why they vary as they do.


22/102

Ordinal variable: A qualitative variable that incorporates an ordered


23/102

Ordinal variable: A qualitative variable that incorporates an ordered

position, orranking.

Discrete variable: A quantitative variable that can assume a countable

number ofvalues. Intuitively, the discrete variable can assume any valuescorresponding to isolated points along a line interval. That is, there is a gap

between any two values.Continuous variable: A quantitative variable that can assume an

uncountable number of values. Intuitively, the continuous variable can assumeany value along a line interval, including every possible value between any twovalues.

Biased sampling method: A sampling method that produces data thatsystematically differ from the sampled population. An unbiased sampling methodis one that is not biased

Sampling frame: A list, or set, of the elements belonging to the population

from which the sample will be drawn.


24/102

Data is a numerical information

Data

Information

Analysis

Knowledge

Only data is useless it has to be organized summarized and presented

based on it is analyzed or estimated these are the functions of statistics

Measurement is done is either quantitative or qualitative

Scales used

Nominal Scale

Ordinal Scale.

Interval Scale.

Ratio Scale


25/102

event is more likely to occur Probabilities closer to 0 indicate that the event


26/102

event is more likely to occur. Probabilities closer to 0 indicate that the eventis less likely to occur.P(A), read P of A, denotes the probability of event A.

IfP(A) 1, the event A is certain to occur.IfP(A) 0, the event A is certain not to occur.

Probability is base for inferential statisticsEvent is outcome of an experiment

Sample space collection of all events

1. All sample point probabilities lie between 0 and 1

2. Sum of probabilities of all sample point within sample space =1


27/102

Mutually exclusive events are statistically independent

When two events are mutually exclusive then the probability ofA or B occurringcan be expressed by the following addition rule for mutually exclusiveevents P(A, or B) P(A) P(B)

A queen of sped and Ace of sped has probability

P(As or Qs)1/52+1/52 with replacement and 1/52+1/51 without replacement

If two events are non-mutually exclusive

addition rule for no mutually exclusive events P(A, or B)= P(A) + P(B)- P(AB)

joint probability. This is calculated by the product of the individual marginal

probabilities P(AB) = P(A) * P(B)The concept ofstatistical dependence implies that the probability of acertain event is dependent on the occurrence of another event


28/102

or successes o e o a num er o ou comes. xpresse as a ormu a,


29/102

p ,

The classic theory assumes that all outcomes have equal likelihood ofoccurring. In the example just cited, each card must have an equal chanceof being chosenno card is larger than any other or in any way more likelyto be chosen than any other card. The classic theory pertains only to outcomes thatare mutually exclusive (ordisjoint), which means that those outcomes maynot occur at the same time. For example, one coin flip can result in a head or a

tail, but one coin flip cannot result in a head and a tail. So the outcome of a headand the outcome of a tail are said to be mutually exclusive in one coin flip, as isthe outcome of an ace and a king as the outcome of one card being drawn.

A probability assignment based on equally likely outcomes uses the formula


30/102

11.30

Chapter 11

Introduction to Hypothesis

Testing


31/102

11.31

Nonstatistical Hypothesis Testing

A criminal trial is an example of hypothesistesting without the statistics.

In a trial a jury must decide between twohypotheses. The null hypothesis is

H0: The defendant is innocent

The alternative hypothesis or researchhypothesis is

H1: The defendant is guilty


32/102


33/102

11.33


There are two possible errors.

A Type I error occurs when we reject atrue null hypothesis. That is, a Type I erroroccurs when the jury convicts an innocentperson. We would want the probability ofthis type of error [maybe 0.001 beyond a

reasonable doubt] to be very small for acriminal trial where a conviction results inthe death penalty, whereas for a civil trial,

where conviction might result in someone


34/102

11.34


A Type II error occurs when we dont

reject a false null hypothesis [accept thenull hypothesis]. That occurs when a guilty

defendant is acquitted. In practice, this type of error is by far the

most serious mistake we normally make.

For example, if we test the hypothesis thatthe amount of medication in a heart pill isequal to a value which will cure your heart

problem and accept the hull hypothesis


35/102

11.35


The probability of a Type I error is denotedas (Greek letteralpha). The probabilityof a type II error is (Greek letterbeta).

The two probabilities are inversely related.Decreasing one increases the other, for a

fixed sample size.

In other words, you cant have and both real small for an old sam le size.


36/102

11.36

Types of Errors

A Type I error occurs when we rejectat ruenull hypothesis (i.e. Reject H0 when itis TRUE)

H0 T F

Reject I

Reject II


37/102

11.37


The critical concepts are theses:

1. There are two hypotheses, the null and thealternative hypotheses.

2. The procedure begins with the assumption that thenull hypothesis is true.

3. The goal is to determine whether there is enoughevidence to infer that the alternative hypothesis is true,orthe null is not likely to be true.

4. There are two possible decisions:

Conclude that there is enough evidence to supportthe alternative hypothesis. Reject the null.

Conclude that there is notenough evidence tosupport the alternative hypothesis. Fail to reject the


38/102

11.38

Concepts of Hypothesis Testing(1)

The two hypotheses are called the nul lhypothes isand the other the alternativeorresearch hypothesis. The usual

notation is:

H0: the null hypothesis

H1: the alternative or research

pronouncedH nought

C f


39/102

11.39

Concepts of HypothesisTesting

Consider mean demand for computersduring assembly lead time. Rather thanestimate the mean demand, our

operations manager wants to knowwhether the mean is d i f ferent from 350uni ts. In other words, someone is claimingthat the mean time is 350 units and we

want to check this claim out to see if itappears reasonable. We can rephrase thisrequest into a test of the hypothesis:

H0: = 350

C f H h i


40/102

11.40

Concepts of HypothesisTesting

For example, if were trying to decide

whether the mean is not equal to 350, alarge value of (say, 600) would provide

enough evidence.

If is close to 350 (say, 355) we could not

say that this provides a great deal ofevidence to infer that the population meanis different than 350.

C t f H th i T ti


41/102

11.41


The two possible decisions that can be made:

Conclude that there isenough evidenceto support thealternative hypothesis

(also stated as: reject the null hypothesis in favor of thealternative)

Conclude that there i s notenough evidenceto supportthe alternative hypothesis

(also stated as: failing to reject the null hypothesis in favorof the alternative)

NOTE: we do not say that we accept the null hypothesis ifa statistician is around

C t f H th i T ti


42/102

11.42


The testing procedure begins with theassumpt ion that the nul l hypo thesis is

true.

Thus, until we have further statisticalevidence, we will assume:

H0: = 350 (assumed to be TRUE)

The next step will be to determine the


43/102

11.43

Is the Sample Mean in the Guts of the SamplingDistribution??


44/102

11.44

Three ways to determine this: First way

1. Unstandardized test statistic: Is inthe guts of the sampling distribution?Depends on what you define as the guts

of the sampling distribution.

If we define the guts as the center 95% of

the distribution [this means = 0.05],then the critical values that define theguts will be 1.96 standard deviations of X-

Bar on either side of the mean of the


45/102

11.45

1. Unstandardized Test Statistic Approach


46/102

11.46

Three ways to determine this: Second way

2. Standardized test statistic: Since wedefined the guts of the sampling

distribution to be the center 95% [ =

0.05], If the Z-Score for the sample mean is

greater than 1.96, we know that will be

in the reject region on the right side or If the Z-Score for the sample mean is

less than -1.97, we know that will be in

the reject region on the left side.


47/102

11.47

2. Standardized Test Statistic Approach


48/102

11.48

Three ways to determine this: Third way

3. The p-valueapproach (which is generally used with acomputer and statistical software): Increase theRejection Region until it captures the sample mean.

For this example, since is to the right of the mean,calculate

P( > 370.16) = P(Z > 1.344) = 0.0901

Since this is a two tailed test, you must double this areafor the p-value.

p-value = 2*(0.0901) = 0.1802

Since we defined the guts as the center 95% [ = 0.05],the reject region is the other 5%. Since our samplemean, , is in the 18.02% region, it cannot be in our 5%


49/102

11.49

3. p-value approach


50/102

11.50

Statistical Conclusions:

Unstandardized Test Statistic:

Since LCV (320.6) < (370.16) 170 (this is what we want todetermine)


53/102

11.53

Example 11.1

What we want to show:

H1: > 170

H0: < 170 (well assume this is true)

Normally we put Ho first.

We know:

n = 400, = 178, and

= 65

= 65/SQRT(400) = 3.25

Example 11 1 Rejection


54/102

11.54

Example 11.1 Rejection

Region The reject ion regionis a range of values

such that if the test statistic falls into thatrange, we decide to reject the null

hypothesis in favor of the alternativehypothesis.

is the critical value of to reject H0.


55/102

11.55

Example 11.1

At a 5% significance level (i.e. =0.05), we get [all inone tail]

Z

= Z0.05 = 1.645

Therefore, UCV = 170 + 1.645*3.25 =175.35

Since our sample mean (178) is greater thanthe criticalvalue we calculated (175.35), we reject the null

hypothesis in favor of H1 OR

(>1.645)Reject null

Example 11 1 The Big


56/102

11.56

Example 11.1 The Big

Picture

=175.34

=178

H1: > 170H0: = 170

Reject H0 in favor of


57/102

11.57

Interpreting the p-value

The smaller the p-value, the morestatistical evidence exists to support thealternative hypothesis.

If the p-value is less than 1%, there isoverwhelm ing evidencethat supportsthe alternative hypothesis.

If the p-value is between 1% and 5%,there is a strong evidencethat supportsthe alternative hypothesis.

If the p-value is between5% and 10%

there is a weak evidencethat supports


58/102

11.58

Interpreting the p-valueOverwhelming Evidence(Highly Significant)

Strong Evidence(Significant)

Weak Evidence

(Not Significant)

No Evidence(Not Significant)

0 .01 .05 .10

p=.0069

Conclusions of a Test of


59/102

11.59

Conclusions of a Test ofHypothesis

If we reject the null hypothesis, weconclude that there is enough evidence toinfer that the alternative hypothesis is true.

If we fail to reject the null hypothesis, weconclude that there is not enough

statistical evidence to infer that thealternative hypothesis is true. This doesnot mean that we have proven that the null

hypothesis is true!

One tail test with rejection


60/102

11.60

One tail test with rejectionregion on right

The last example was a one tai l test,because the rejection region is located inonly one tail of the sampling distribution:

More correctly, this was an example of a

One tail test with rejection


61/102

11.61

One tail test with rejectionregion on left

The rejection region will be in the left tail.

T t il t t ith j ti i i b th


62/102

11.62

Two tail test with rejection region in bothtails

The rejection region is split equallybetween the two tails.


63/102

11.63

Example 11.2 Students work

AT&Ts argues that its rates are such that

customers wont see a difference in their

phone bills between them and their

competitors. They calculate the mean andstandard deviation for all their customersat $17.09 and $3.87 (respectively). Note:

Dont know the true value for , so weestimate from the data [ ~ s = 3.87]large sample so dont worry.

They then sample 100 customers at


64/102

11.64

Example 11.2

The rejection region is set up so we canreject the null hypothesis when the teststatistic is large orwhen it is small.

stat is small stat is large


65/102

11.65

Example 11.2

At a 5% significance level (i.e. = .05),we have

/2 = .025. Thus, z.025 = 1.96 and our

rejection region is:

z 1.96

z-z.025 +z.0250


66/102

11.66

Example 11.2

From the data, we calculate = 17.55

Using our standardized test statistic:

We find that:

Since z = 1.19 is not greater than 1.96, nor

less than1.96 we cannot reject the null

Summary of One- and Two-Tail
http://e/TT%20PowerPoint%20slides/References/Xm11-02.xlshttp://e/TT%20PowerPoint%20slides/References/Xm11-02.xls


67/102

11.67

Summary of One- and Two-TailTests

One-Tail Test

(left tail)

Two-Tail Test One-Tail Test

(right tail)


68/102

11.68

Probability of a Type II Error

A Type II error occurs when a false nullhypothesis is not rejected or you acceptthe null when it is not true but dont say it

this way if a statistician is around.

In practice, this is by far the most serious

error you can make in most cases,especially in the quality field.


69/102

11.69

Judging the Test

A statistical test of hypothesis is effectivelydefined by the significance level ( ) andthe sample size (n), bo th of wh ich are

selectedby the statistics practitioner.

Therefore, if the probability of a Type II

error ( ) is too large [we have insufficientpower], we can reduce it by

increasing , and/or

increasin the sam le size n.


70/102

11.70

Judging the Test

The power of a testis defined as 1 . It represents the probability of rejecting the null

hypothesis when it is false and the true mean issomething other than the null value for the mean.

If we are testing the hypothesis that the average amountof medication in blood pressure pills is equal to 6 mg(which is good), and we fail to reject the nullhypothesis, ship the pills to patients worldwide, only to

find out later that the true average amount ofmedication is really 8 mg and people die, we get introuble. This occurred because the P(reject the null / truemean = 7 mg) = 0.32 which would mean that we have a68% chance on not rejecting the null for these BAD pills

and shipping to patients worldwide.


71/102

11.71

Probability you ship pills whose mean amount of medication is 7 mg approximately 67%

DefinitionWhen we select a sample from a population and then try to estimate the


72/102

population parameter from the sample, we will not be entirely accurate. Thedifference between the population parameter and the sample statistic is thesampling error.


73/102


74/102


75/102

Data collection


76/102

Statistics is the study of how to collect, organize, analyze, and interpret

numerical information from data.The goal of stat is t icsis to gain understanding from data

Individuals are the people or objects included in the study.A variable isa characteristic of the individual to be measured or observed

A quantitative variable has a value or numerical measurement for whichoperations such as addition or averaging make sense.A qualitative variable describes an individual by placing the individual into a

category or group,such as male or female it is categorical variablesIn population data, the data are from every ind iv idu al of interest.In sample data, the data are from only some of the indiv iduals ofinterest.

A parameter is a numerical measure that describes an aspect of a

population.A statistic is a numerical measure that describes an aspect of a sample.

DATA


77/102

Summarizing the data: Summarization is a process in which the data isreduced for interpretation without sacrificing any important information.Finding hidden relation ship, Anomalies ,trends, estimating ,predicting

Data analysis task

Data


78/102

Element

Dictums'

Discrete

Variable

Continuous Discrete

QualitativeQuantitativeInterval

Ratio scaleFor measurement

NominalOrdinal scale

For measurement

investigatoris interested. The population is also called the universe.


79/102

A sample is a subset of measurements selected from the population.Sampling from the population is often done randomly, such that everypossible sample ofn elements will have an equal chance of beingselected. A sample selected in this way is called a simple random sample,or just a random sample. A random sample allows chance to determineits elements.

A survey by an electric company contains questions on the following:1. Age of household head.2. Sex of household head.

3. Number of people in household.4. Use of electric heating (yes or no).5. Number of large appliances used daily.6. Thermostat setting in winter.7. Average number of hours heating is on.8. Average number of heating days.

9. Household income.10. Average monthly electric bill.11. Ranking of this electric company as compared with two previous electricitysuppliers.Describe the variables implicit in these 11 items as quantitative or qualitative, anddescribe the scales of measurement

Given a set of numerical observations, we may order them according to magnitude.Once we have done this, it is possible to define the boundaries of the set. Anyt d t


80/102

studentwho has taken a nationally administered test, such as the Scholastic Aptitude Test(SAT), is familiar withpercentiles. Your score on such a test is compared with thescores

of all people who took the test at the same time, and your position within this groupisdefined in terms of a percentile. If you are in the 90th percentile, 90% of the peoplewho took the test received a score lower than yours. We define a percentile asfollows.The Pth percent i le of a group o f num bers is that value below which l ie P%(P percent) of the numbers in the group. The position of the Pth percentileis given by (n 1)P/100, where n is the number of data points.

The magazine Forbes publishes annually a list of the worlds wealthiest individuals.For 2007, the net worth of the 20 richest individuals, in billions of dollars, in noparticular

order, is as follows:33, 26, 24, 21, 19, 20, 18, 18, 52, 56, 27, 22, 18, 49, 22, 20, 23, 32, 20, 18Find the 50th and 80th percentiles of this set of the worlds top 20 net worths.


81/102


82/102


83/102

Basic concept of Probability

It is better to be roughly right than precisely wrong


84/102

It is better to be roughly right than precisely wrong.

John Maynard Keynes

You all have probably heard the story about Malcolm Forbes, who once got lostfloating for miles in one of his famous balloons and finally landed in the middle of a

cornfield. He spotted a man coming toward him and asked, Sir, can you tell mewhere I am? The man said, Certainly, you are in a basket in a field of corn.Forbes said, You must be a statistician. The man said, Thats amazing, how did

youknow that? Easy, said Forbes, your information is concise, precise, and

absolutelyuseless!


85/102



86/102

Nominal scale A scale of measurement for a variable that uses a label or name

to identify

an attribute of an element. Nominal data may be nonnumeric or numeric.

Ordinal scale A scale of measurement for a variable that has the properties ofnominal

data and can be used to rank or order the data. Ordinal data may be nonnumeric ornumeric.Interval scale A scale of measurement for a variable that has the properties of

ordinal

data and the interval between observations is expressed in terms of a fixed unit ofmeasure.Interval data are always numeric.Ratio scale A scale of measurement for a variable that has all the properties of

interval

data and the ratio of two values is meaningful. Ratio data are always numeric.

Measure of variation


87/102

Data

Qualitative or

attribute

Discret

eContinuou

s

Type of car owned.

Color of pens.

Number of children. Time taken for

an exam.

Presentation

The pineapples are the objects (individuals) of the


88/102

The pineapples are the objects (individuals) of thestudy. If the researchers are

interested in the individual weights of pineapples in thefield, then the variable

consists of weights. At this point, it is important tospecify units ofmeasurement and degree of accuracy ofmeasurement. The weights could be

measured to the nearest ounce or gram. Weight is aquantitative variable

because it is a numerical measure. If weights ofall theready-to-harvest pineapples

in the field are included in the data, then we have apopulation. The average

weight of all ready-to-harvest pineapples in the field is aparameter.

(b) Suppose the researchers also want data on taste. A panel of tasters rates thepineapples according to the categories poor, acceptable, and good. Onlysome of the pineapples are included in the taste test. In this case, the variable istaste. This is a qualitative or categorical variable. Because only some of thepineapples in the field are included in the study, we have a sample. The proportionof pineapples in the sample with a taste rating of good is a statistic.


89/102

OrderedArray

OgivePolygonHisto-

gram

FrequencyDistributions

NumericalData

Stem-&-Leaf

Display

Numerical (Quantitative)

Data Presentation


90/102

Numerical (Quantitative)

Data Presentation

summar z nginformation from samples or populations.Inferential statistics involves methods of using information from a sample to


91/102

draw conclusions regarding the population.

A simple random sample ofn measurements from a populat ion is a subsetof the population selected in a manner such that every sample of size n fromthe population has an equal chance of being selected.

Probability


92/102



93/102

Central tendencyMean is summarizing the data in to one fig. summarize a wide range of

measurements with a single value?Mean X number = TotalWhen there is no trend and values are fluctuating arithmetic mean is a bestrepresentative. Distribution is normal and not skewArithmetic mean > Geometric mean > Harmonic mean

Probably the least understood, the harmonic mean is best used in situations whereextreme outliers exist in the population. The harmonic mean can be manuallycalculated; however, most people will find it much easier to just use Excel. In Excel,the harmonic mean can be calculated by using the HARMEAN() function

The arithmetic mean is best used in situations where:

the data are not skewed (no extreme outliers)the individual data points are not dependent on each other (see the section below forexamples of where data are interrelated, e.g., financial analysis)

Geometric means are often useful summaries for highly skewed dataWhenthere is growth or trend observed geometric mean is best

Functions of statisticsSome important functions of statistics are as follows1 To collect and present facts in a systematic manner


94/102

1. To collect and present facts in a systematic manner.2. Helps in formulation and testing of hypothesis.3. Helps in facilitating the comparison of data.4. Helps in predicting future trends.

5. Helps to find the relationship between variable.6. Simplifies the mass of complex data.7. Help to formulate polices.8. Helps Government to take decisions.Limitations of statistics

1. Does not study qualitative phenomenon.

2. Does not deal with individual items.3. Statistical results are true only on an average.4. Statistical data should be uniform and homogeneous.5. Statistical results depends on the accuracy of data.6. Statistical conclusions are not universally true.7. Statistical results can be interpreted only if person hassound knowledge ofstatistics

Data collection

Central tendency and Dispersion


95/102

Central tendency and DispersionCentral tendency is middle point of distribution measures of central tendency is alsocalled measure of locationDispersion is spread of data in distribution extent to which data is scatteredThere are two more characteristics skewness and kurtosis

Mean of individual data x/nMean for grouped data (fXx)/n x= midpoint of classArithmetic mean has following advantages1. Simple to understand2. It is one and only one for data set3. Mean is suitable for statistical procedureDisadvantage1.Afected by extreme observation2.It is not representative of whole data

Weighted average meanGeometric meanMedian

Basic concept of ProbabilityRolling of die


96/102

Taking out card from deck of cards

Probability of 5 or 6 P(5) or P(6) = 1/6+1/6

Probability of sped and queen P(s) or P(q) = 13/52+4/52-1/52P(s) +P(q)P(s AND q)

And or are called as operators



97/102



98/102



99/102



100/102



101/102



102/102

Symboisis Statistics

Documents

Transcript of Symboisis Statistics