Post on 04-Jun-2018
8/13/2019 Biostat lec01 basicconcepts
1/15
111
Biostatistics School of Biotechnology International University
Slide 1
Basic Concepts ofStatistics
Dang Quoc Tuan, Ph.D
School of Biotechnology
International University
Lecture 1
Biostatistics School of Biotechnology International University
Slide 2
Statistics
What is Statistics?
Why Statistics?
- Statistic vs. Statistics?
How to learn Statistics?
- Common sense vs. mathematical expertise
- Almost all fields of study benefit from theapplication of statistical methods
Biostatistics School of Biotechnology International University
Slide 3Goals of the lecture:To introduce fundamental concepts and definitions in Statistics:
- Statistic vs. Statistics
- Descriptive and inferential statistics
- Population vs. sample
- Parameter vs. statistic
- Variable, random variable, random number
- Data, types of data
- Observation, event, measurement
- Experiment, treatment, replication
- Sampling, types of sampling, sampling errorBiostatistics School of Biotechnology International University
Slide 4
Biostatistics
Statistics is the collection, processing, interpretation
and presentation of numerical information.
Biostatistics is the application of statistics to questions
about living systems.
Biostatistics is an umbrella term that encompasses statistical
research in several subject matter areas. These areas include
pharmacology, medicine, biology, genetics, biotechnology, food
technology and public health.
8/13/2019 Biostat lec01 basicconcepts
2/15
222
Biostatistics School of Biotechnology International University
Slide 5Biostatistics
Statistics is critical in analyzing patterns of genomic variationwithin populations, and in relating this variation to diseasestates or other phenotypes
- Genomes differ from the reference copy(single nucleotide polymorphisms, structural variants)
- Gene mapping by linkage and association methods
Statistics supports analyses to determine the functionof genes/transcripts/proteins
Biostatistics School of Biotechnology International University
Slide 6
Introduction to Statistics
1. Population and Data
2. Types of Data
3. Critical Thinking
4. Design of Experiments
Biostatistics School of Biotechnology International University
Slide 7Overview
A common goal of surveys and other data
collecting tools is to collect data from a smaller
part of a larger group so we can learn
something about the larger group.
In this section we will look at some of ways to
describe data.
Biostatistics School of Biotechnology International University
Slide 8Overview
Statistics
Two Meanings Specific numbers
Method of analysis
Field of study
Method of analysis(a way of thinking)
8/13/2019 Biostat lec01 basicconcepts
3/15
333
Biostatistics School of Biotechnology International University
Slide 9Overview
Specific numbernumerical measurement determined by aset of data
Example:
- 23% of people polled believed that there are too manypolls.
- Average age of Vietnamese men in 2000 is 68.
- Price index in December increased by 1% compared tothat in November
Biostatistics School of Biotechnology International University
Slide 10
Statistics=method of analysis
a collection of methods for planning experiments,
obtaining data, and then organizing, summarizing,
presenting, analyzing, interpreting, and drawing
conclusions based on the data.
= drawing of inferences (generalization) about the
large groups (population) on the basis of
observations made on smaller ones (sample)
Definitions
Biostatistics School of Biotechnology International University
Slide 11
Definitions
Populationthe complete collection of all elements(scores, people, measurements, and so on)to be studied.
The collection is complete in the sense thatit includes all individual items or unitswhich are the subject of investigation.
Unit = an individual of the population
Biostatistics School of Biotechnology International University
Slide 12
Census
the collection of data from every member of thepopulation
Samplea sub-collection of elements drawn from a
population
Sample sizenumber of units in the sample (or % of units fromthe population)
Definitions
8/13/2019 Biostat lec01 basicconcepts
4/15
444
Biostatistics School of Biotechnology International University
Slide 13
Variable
Characteristics of a population which differ from unit to
unit
Data
Observations on the variable (such as measurements,
degrees, orders, properties, outcome, results) that havebeen measured and collected
Definitions
Biostatistics School of Biotechnology International University
Slide 14
Population: All 1st year students in the School
of Biotechnology
Unit: a student
Variable: score in a math final exam
Observation (data): points of the score (60, 71,
95 , etc..)
Sample: a group of 30 student
Sample size: 30
Example:
Biostatistics School of Biotechnology International University
Slide 15Random sampling and randomnumbers
Sample data must be collected in anappropriate way, such as through aprocess of random selection
(each unit in a population must have anequal chance of being drawn)
If sample data are not collected in anappropriate way, the data may beso completely useless, information
may not be properly extrapolated to thepopulation
Biostatistics School of Biotechnology International University
Slide 16Random sampling and randomnumbers
Random number
Select units to be measured by referenceto random number
The way to avoid bias
Random number table (in any statisticalbook)
Computer: MINITAB, Excel, SAS,StatGraphic, SPSS, etc.
Calculator (some versions)
8/13/2019 Biostat lec01 basicconcepts
5/15
555
Biostatistics School of Biotechnology International University
Slide 17Random Sampling
selection so that each has anequal chance of being selected
In Excel: RANDBETWEEN (a, b)
Biostatistics School of Biotechnology International University
Slide 18
Table of random numbers
Biostatistics School of Biotechnology International University
Slide 19Descriptive and inferentialstatistics
Descriptive Statistics
summarize or describe the important
characteristics of a known set of
population data
Inferential Statistics
use sample data to make inferences (orgeneralizations) about a population
Biostatistics School of Biotechnology International University
Slide 20
Types of Data
Processing Data
8/13/2019 Biostat lec01 basicconcepts
6/15
666
Biostatistics School of Biotechnology International University
Slide 21
Parametera numerical measurement describingsome characteristic of a population
population
parameter
Definitions
Biostatistics School of Biotechnology International University
Slide 22Definitions
Statistica numerical measurement describingsome characteristic of a sample
sample
statistic
XS
Biostatistics School of Biotechnology International University
Slide 23Definitions
Quantitative data
Numbers representing counts or measurements
Example: weights, lengths, ages, pressure,temperature
Biostatistics School of Biotechnology International University
Slide 24Definitions
Qualitative (or categorical orattribute) data
can be separated into different categories
that are distinguished by some non-numericcharacteristics.
Example: genders (male/female),
colors (blue, red, )
marital status
levels of satisfaction
8/13/2019 Biostat lec01 basicconcepts
7/15
777
Biostatistics School of Biotechnology International University
Slide 25Working with
Quantitative DataQuantitative data:- Measure of quantity- Can compare one to others (more orless)- Can calculate an average
Quantitative data can furtherbe distinguished betweendiscrete and continuous types
Biostatistics School of Biotechnology International University
Slide 26
Discrete
data result when the number of possiblevalues is either a finite number or acountable number of possible values
0, 1, 2, 3, . . .
Example: The number of eggs that hens lay
Definitions
Biostatistics School of Biotechnology International University
Slide 27
Continuous(numerical) data result from infinitely many possiblevalues that correspond to some continuous scalethat covers a range of values without gaps,interruptions, or jumps
Definitions
2 3
Example: The amount of milk that a cow produces;e.g. 2.343115 gallons per day
Biostatistics School of Biotechnology International University
Slide 28
Levels of Measurement
Another way to classify data is touse levels of measurement. Fourof these levels are discussed inthe following slides
8/13/2019 Biostat lec01 basicconcepts
8/15
888
Biostatistics School of Biotechnology International University
Slide 29
Example: - Survey responses: yes, no, undecided
- Marital status: single, married, divorced,
widows
Definitions nominal level of measurement
characterized by data that consist of names, labels, or
categories only. The data cannot be arranged in an
ordering scheme (such as low to high)
Biostatistics School of Biotechnology International University
Slide 30
ordinal level of measurement
involves data that may be arranged in some order, but differences
between data values either cannot be determined or are meaningless. It
is used to indicate rank order, but nothing more
Definitions
Examples: Course grades A, B, C, D, or F
Score given to answer such as how often you use a bus service?:
- very often: 5
- Often: 4- Occasionally: 3
- Rarely: 2
- Never: 1
It gives a bit more information than nominal, but still cant calculate the average
Biostatistics School of Biotechnology International University
Slide 31
interval level of measurement
like the ordinal level, with the additional property that the
difference between any two data values is meaningful.
However, there is no natural zero starting point (where
none of the quantity is present).
- The interval can be added or subtracted but not divided (the
ratio makes no sense)
Date is a very widely used interval scale.
Example: - Years 1000, 2000, 1776, and 1492
- Temperature: 5oC, 10oC, 20oC
- 1st , 5th, 10th day in a month
Definitions
Biostatistics School of Biotechnology International University
Slide 32
ratio level of measurement
the interval level modified to include the natural zero
starting point (where zero indicates that none of the
quantity is present). For values at this level, differences
and ratios are meaningful. It incorporate the properties ofthe interval, ordinal and nominal levels
Example:
- Prices of college textbooks ($0 represents no cost)
- Measurement of mass and length
Definitions
8/13/2019 Biostat lec01 basicconcepts
9/15
999
Biostatistics School of Biotechnology International University
Slide 33Summary -Levels of Measurement
Nominal - categories only
Ordinal - categories with some order
Interval - differences but no naturalstarting point
Ratio - differences and a natural startingpoint
Biostatistics School of Biotechnology International University
Slide 34Summary -Levels of Measurement
Nominal
Data
Qualitative Quantitative
Ordinal Interval Ratio
Biostatistics School of Biotechnology International University
Slide 35
Recap
Basic definitions and terms describing data
Parameters versus statistics
Types of data (quantitative and qualitative)
Levels of measurement
In the previous sections we have looked at:
Biostatistics School of Biotechnology International University
Slide 36
Critical Thinking
8/13/2019 Biostat lec01 basicconcepts
10/15
101010
Biostatistics School of Biotechnology International University
Slide 37
Success in StatisticsSuccess in the introductory statistics
course typically requires more commonsense than mathematical expertise
This section is designed to illustrate
how common sense is used when wethink critically about data and statistics
Biostatistics School of Biotechnology International University
Slide 38
Limitations of Statistics:
-Not to proof anything, just to show
the chance of occurring of some
event
-May lead to some misuse
Biostatistics School of Biotechnology International University
Slide 39
self-selected survey
(or voluntary response sample)
one in which the respondents themselves decide whether to be
included
In this case, valid conclusions can be made only about the
specific group of people who agree to participate.
Abuses (or misuses) of Statistics
Bad Samples
Biostatistics School of Biotechnology International University
Slide 40
Abuses of Statistics
Loaded Questions
Misleading Graphs
Bad Samples
Small Samples
8/13/2019 Biostat lec01 basicconcepts
11/15
111111
Biostatistics School of Biotechnology International University
Slide 41
Bachelor High School
Degree Diploma
Figure. Salaries of People with Bachelors Degrees and with High
School Diplomas
$40,000
30,000
25,000
20,000
$40,500
$24,400
35,000
$40,000
20,000
10,000
0
$40,500
$24,40030,000
Bachelor High School
Degree Diploma
(a) (b)
Biostatistics School of Biotechnology International University
Slide 42
We should analyze thenumerical information givenin the graph instead of beingmislead by its general shape
Misleading Graphs
Biostatistics School of Biotechnology International University
Slide 43
Bad Samples
Small Samples
Misleading Graphs
Pictographs
Distorted PercentagesLoaded Questions
Order of Questions
Refusals
Correlation & Causality
Self Interest Study
Precise Numbers
Partial Pictures
Deliberate Distortions
Misuses of Statistics
Biostatistics School of Biotechnology International University
Slide 44
Design of Experiments
8/13/2019 Biostat lec01 basicconcepts
12/15
8/13/2019 Biostat lec01 basicconcepts
13/15
131313
Biostatistics School of Biotechnology International University
Slide 49
Sample Sizeuse a sample size that is large enough to seethe true nature of any effects and obtain thatsample using an appropriate method, such as
one based on randomness
Sample Size
Biostatistics School of Biotechnology International University
Slide 50
Random Samplemembers of the population are selected insuch a way that each individual member hasan equal chance of being selected
Definitions
Simple Random Sample (of size
n
)subjects selected in such a way that every
possible sample of the same size n has the
same chance of being chosen
Biostatistics School of Biotechnology International University
Slide 51
Randomnesswhy randomness is important in statistics?
Random sample = representativesample-The best way to get a representative sample =choose a proportion of a population at random
-Every possible experimental unit having equalchance of being selected, without bias
Random sample
Biostatistics School of Biotechnology International University
Slide 52
Random Sampling - selection so thateach has an equal chance of being selected
8/13/2019 Biostat lec01 basicconcepts
14/15
141414
Biostatistics School of Biotechnology International University
Slide 53
Systematic Sampling - Select somestarting point and then select every k-th element inthe population
Biostatistics School of Biotechnology International University
Slide 54Stratified Samplingsubdivide the population into at
least two different subgroups that share the samecharacteristics, then draw a sample from each
subgroup (or stratum)
Biostatistics School of Biotechnology International University
Slide 55
Cluster Sampling - divide the populationinto sections (or clusters); randomly select some ofthose clusters; choose allmembers from selectedclusters
Biostatistics School of Biotechnology International University
Slide 56Major Points
If sample data are not collected in anappropriate way, the data may be socompletely useless.
Randomness typically plays a criticalrole in determining which data tocollect.
8/13/2019 Biostat lec01 basicconcepts
15/15
151515
Biostatistics School of Biotechnology International University
Slide 57
Random
Systematic
Stratified
Cluster
Methods of Sampling
Biostatistics School of Biotechnology International University
Slide 58
Sampling Error
the difference between a sample result and the truepopulation result; such an error results from chancesample fluctuations
Nonsampling Errorsample data that are incorrectly collected, recorded,or analyzed (such as by selecting a biased sample,using a defective instrument, or copying the data
incorrectly)
Definitions
Precision vs. Accuracy?
Biostatistics School of Biotechnology International University
Slide 59
Recap
In this section we have looked at:
Types of studies and experiments
Controlling the effects of variables
(replication and sample size)
Randomization
Types of sampling
Sampling errors
Biostatistics School of Biotechnology International University
Slide 60
HOMEWORK
Chernick: Introductory Biostatistics for theHealth Sciences
2.1; 2.2; 2.8; 2.14
3.1