Chapter 1

Definitions

Population and sample

Population - collection containing all individual objects, events of interest
is is often unreasonable or impossible to examine the whole population
Sample - relatively small collection taken from a population that is studies to gain understanding about the population
a sample is biased to the degree that the sample under- or over- represents the population

Descriptive statistics

Descriptive statistic is a value that summarizes a collection.
Based on type of collection being summarized:
- Statistic - is a descriptive of a sample
- Parameter - is a descriptive of a population

A sample is unbiased if its sample statistic equals the population parameter
A single sample may be unbiased regarding one descriptive statistic and biased regarding another

Many descriptive statistics can be used to represent a collection

The information relayed by a descriptive statistic does not fully represent the collection

Inferential statistics

Goal

The goal of inferential statistics is to estimate the value of a parameter (descriptive statistic of a population) from the value of a statistic (descriptive statistic of a sample)

Fundamental questions

  • Without knowing the parameter, can we know if sample is biased?
  • Is it reasonable to expect samples to be unbiased?
  • If we expect samples to be unbiased, what's the better way of estimating a population parameter than just returning a sample statistic's value as our estimate

Sampling methods

Bias due to random change, not methodology

We never know if the sample is biased or not, but we can develop unbiased sampling methods which remove the bias from the method of choosing the sample. It comes purely from the characteristics of it.

Example

We want to find average height of female FHSU student.

We go to the basketball team and choose 5 players
method - biased, sample - biased
We get 5 random students, and they all happen to be on basketball team
mathod - unbiased, sample - biased

Unbiased Sampling Methods

Simple Random Sampling (SRS)

Gold standard

For any given sample size n, every possible sample of that size is equally likely to be chosen
A simple random sample can be formed by randomly choosing member of population, and then randomly selecting another member (excluding the previously chosen ones)

Natural groupings

A group is homogeneous if every member is similar with respect to the interest at hand
A group is heterogeneous if the members are significantly different wrt the interest at hand

Unbiased sampling methods with natural groupings
Stratified Random Sampling

Make groups (strata) of similar samples, then choose from each and every strata

Within groups - homogeneous
Between groups - significant differences
Natural groups are called strata
Procedure - conduct SRS from each strata in a proportionate way

Cluster Random Sampling

Make groups (clusters) of significantly different samples, then study a cluster

Within groups - heterogeneous
Between groups - mostly similar
Natural groups are called clusters
Procedure - conduct SRS on the clusters
Single-stage - study every member of each of the clusters
Double-stage - conduct SRS from each of the clusters chosen in the initial SRS

Biased sampling methods

Convenience sampling

Sample is constructed of members of the population with particular characteristic that simplify the collection of data
for example - teacher doing survey on their own students

Voluntary response sampling

Mass request is sent or posted asking for participation and all respondents are included in the sample
the most common responders will be people with strong positive/negative opinions towards the subject

Systematic sampling

sample is constructed by selecting members of the population for the sample based on same pattern of encounter
for example - selecting every 5th member of the population

Final reinforcement on bias

Sampling methods are unbiased, if all bias in a sample comes due to random chance.

Both biased and unbiased sampling methods can produce biased and unbiased samples

Types of variables

Importance of variable classification

Visualiation norms differ based on variable type
Some decriptive statistics are not defined for some variables, or lack meaning

Classification of variables

Quantitative vs qualitative
Quantitative

Has numerical representation that is subject to meaningful arithmetic difference

Qualitative

Variable that is not quantitative

Numerical qualitative variables

Examples of qualitative numerical variables
Race placement (1, 2, ..., 8) - comparing the places doesn't give more information that we already have
Telephone numbers - there is no bigger significance of a telephone number

Non-numerical quantitative variables

Birth date (Jan 3, 1982) - is quantitative because we can meaningfully compare it with another birth date

Continuous and discrete quantitative variables

A quant. var. is continuous if it can take on any numerical value in some interval of real numbers (height, weight, time)

A quant. var. is discrete if it is not continuous

Levels of measurement
Level Categorizes Orders Meaningful Arithm. Diff. Meaningful ratios
Nominal X
Ordinal X X
Interval X X X
Ratio X X X X
Nominal - is person in grad school
Ordinal - got first place
Interval - is between x and y
Ratio -

Nominal and ordinal levels are qualitative
Interval and ratio levels are quantitative

It's hard to determine whether all ratios are meaningful.
Ratios are meaningful if zero value is a meaningful zero.