5 - Sample Distributions
Sample Distribution of Sample Means
Interest in some quantitative variable such that every member of the population is associated witha particular numerical value
The sampling distribution of sample means is the probability distribution for the random variable result from
Random experiment - iid sampling with a fixed sample size
Numerical assignment - computing the sample mean
Central Limit Theorem
Given any infinite population with population mean
Also -
Practically infinite populations:
The distributions produced by SRS and iid are good approximations for one another for practically infinite populations
For the most common populations,
If population is normal, all sampling distribution of sample means, regardless of the sample size, are normal as well by a normal distribution
If population is normal, all sampling distributions of sample means, regardless of the sample size, are normal as well
for
if
if population is normal, all SDSM = normal
Exercises
A population is uniformly distributed taking on values from 10 to 35. The expected value of the population is 22.5, and the stdev is
- Determine the probability that a random selected membre of the population has a value larger than 20
- Determine the probability that a random sample of size 32 taken from a population has an average value larger than 20
Now we're in the world of sampling distributions of sample means.
, so CLT applied always put words 'so CLT applies'
The average value of a house in a particular city is $175000 with a standard deviation of $10000
- Determine the probability that a random selected house of this city has a value larger than $180000
We cannot answer this question, because it asks us for a single home, and we don't know the parent distribution - Determine the probability that a random sample of size 49 has an average value larger than $180000
- Determine the probability that a random sample of size 49 taken from the population has an average within 2 standard deviations of the mean
- Determine the probability that a random sample of size 64 taken from the population has an average within 2 standard deviations of the mean
IQ are normally distributed with a mean of 100 and standard deviation of 10. Suppose a random sample of size 4 is taken from the population. Determine the following probabilities:
Population distribution (which is normal), withand
Because we have, we work on SDSM
Theand are the same as in 2.
Now, for the sample size of 16, 64, 100, and 150, we can see that
Sampling Distribution of Sample Proportions
Taking interest in some characteristic such that every member of the population either has or does not have the characteristic
The sampling distribution of sample proportions is the probability distribution for the random variable resulting from:
- Random experiment - iid sampling with fixed sample size
- Numerical assignment - computing the sample proportion
Constructing Sampling Distribution of Sample Proportions
Consider a population of size
The distributions produced by SRS and iid are good approximations for another for practically infinite populations
Construction
Determine all the possible sample proportions
for
Show how the SDSP is related to a binomial variable
our numerator is a binomial variable with
Determine the expected value and variance of the SDSP
We can use really useful formulas:
And to bring those formulas into SDSP world, we just have to multiply by
So the expected value
Creating the probability table for possible sample proportions
binom.dist(0, 6, p, 0) |
|
binom.dist(1, 6, p, 0) |
|
... |
|
... |
|
... |
|
... |
|
... |
SDSP Computations
If
A more conservative (cautious) criteria is
If there conditions are not met, do not use normal approximation - use binomial computations
For exams, use the less conservative (>5) criteria
Useful formulas:
For the distribution to be normal -
Examples
-
A politician has 40% of electorate voting for him. Determine the probability that a random sample of size 20 will indicate that he will be a popular vote (>50%)
,
checking if the distribution is normal -, and , so normal distribution is appropriate
The=norm.distis a approximation, whereas=binom.distgives the correct value. But, for the distributions approximable to normal, we should use=norm.dist -
The MTHFR gene mutation has a prevalence rate of 40% among Caucasians.
- How large of a sample would be necessary to get the standard deviation of the sampling distribution of sample proportions to be less than 0.01?
We should take at least - Determine
for this smallest sample
- Determine
for this smallest sample
- Determine
such that for this smallest sample
We can use the fact that if the part we want is, then the left border is a , and use this value to calculate k:
Second way is to use z-distribution. It moves the mean to 0, and allows use to use
- How large of a sample would be necessary to get the standard deviation of the sampling distribution of sample proportions to be less than 0.01?