Chapter 1
Definitions
Population and sample
Population - collection containing all individual objects, events of interest
is is often unreasonable or impossible to examine the whole population
Sample - relatively small collection taken from a population that is studies to gain understanding about the population
a sample is biased to the degree that the sample under- or over- represents the population
Descriptive statistics
Descriptive statistic is a value that summarizes a collection.
Based on type of collection being summarized:
- Statistic - is a descriptive of a sample
- Parameter - is a descriptive of a population
A sample is unbiased if its sample statistic equals the population parameter
A single sample may be unbiased regarding one descriptive statistic and biased regarding another
Many descriptive statistics can be used to represent a collection
The information relayed by a descriptive statistic does not fully represent the collection
Inferential statistics
The goal of inferential statistics is to estimate the value of a parameter (descriptive statistic of a population) from the value of a statistic (descriptive statistic of a sample)
- Without knowing the parameter, can we know if sample is biased?
- Is it reasonable to expect samples to be unbiased?
- If we expect samples to be unbiased, what's the better way of estimating a population parameter than just returning a sample statistic's value as our estimate
Sampling methods
Bias due to random change, not methodology
We never know if the sample is biased or not, but we can develop unbiased sampling methods which remove the bias from the method of choosing the sample. It comes purely from the characteristics of it.
Example
We want to find average height of female FHSU student.
We go to the basketball team and choose 5 players
method - biased, sample - biased
We get 5 random students, and they all happen to be on basketball team
mathod - unbiased, sample - biased
Unbiased Sampling Methods
Simple Random Sampling (SRS)
Gold standard
For any given sample size
A simple random sample can be formed by randomly choosing member of population, and then randomly selecting another member (excluding the previously chosen ones)
Natural groupings
A group is homogeneous if every member is similar with respect to the interest at hand
A group is heterogeneous if the members are significantly different wrt the interest at hand
Unbiased sampling methods with natural groupings
Stratified Random Sampling
Make groups (strata) of similar samples, then choose from each and every strata
Within groups - homogeneous
Between groups - significant differences
Natural groups are called strata
Procedure - conduct SRS from each strata in a proportionate way
Cluster Random Sampling
Make groups (clusters) of significantly different samples, then study a cluster
Within groups - heterogeneous
Between groups - mostly similar
Natural groups are called clusters
Procedure - conduct SRS on the clusters
Single-stage - study every member of each of the clusters
Double-stage - conduct SRS from each of the clusters chosen in the initial SRS
Biased sampling methods
Convenience sampling
Sample is constructed of members of the population with particular characteristic that simplify the collection of data
for example - teacher doing survey on their own students
Voluntary response sampling
Mass request is sent or posted asking for participation and all respondents are included in the sample
the most common responders will be people with strong positive/negative opinions towards the subject
Systematic sampling
sample is constructed by selecting members of the population for the sample based on same pattern of encounter
for example - selecting every 5th member of the population
Final reinforcement on bias
Sampling methods are unbiased, if all bias in a sample comes due to random chance.
Both biased and unbiased sampling methods can produce biased and unbiased samples
Types of variables
Importance of variable classification
Visualiation norms differ based on variable type
Some decriptive statistics are not defined for some variables, or lack meaning
Classification of variables
Quantitative vs qualitative
Quantitative
Has numerical representation that is subject to meaningful arithmetic difference
Qualitative
Variable that is not quantitative
Numerical qualitative variables
Examples of qualitative numerical variables
Race placement (1, 2, ..., 8) - comparing the places doesn't give more information that we already have
Telephone numbers - there is no bigger significance of a telephone number
Non-numerical quantitative variables
Birth date (Jan 3, 1982) - is quantitative because we can meaningfully compare it with another birth date
Continuous and discrete quantitative variables
A quant. var. is continuous if it can take on any numerical value in some interval of real numbers (height, weight, time)
A quant. var. is discrete if it is not continuous
Levels of measurement
| Level | Categorizes | Orders | Meaningful Arithm. Diff. | Meaningful ratios |
|---|---|---|---|---|
| Nominal | X | |||
| Ordinal | X | X | ||
| Interval | X | X | X | |
| Ratio | X | X | X | X |
| Nominal - is person in grad school | ||||
| Ordinal - got first place | ||||
| Interval - is between x and y | ||||
| Ratio - |
Nominal and ordinal levels are qualitative
Interval and ratio levels are quantitative
It's hard to determine whether all ratios are meaningful.
Ratios are meaningful if zero value is a meaningful zero.