6 - Inferential Statistics

b ### Computations
Consider a population with μ=50, and σ=12 that is randomly sampled with n=36

  1. Determine the following probabilities:
    We're using SDSM, so we calculate
    μx¯=μ=50 and σx^=σn=1236=2

    • P(x¯1<μ<x¯+1)=norm.dist(51,50,2,1)norm.dist(49,50,2,1)=0.3829
    • P(x¯3<μ<x¯+3)=norm.dist(53,50,2,1)norm.dist(47,50,2,1)=0.8664
    • P(x¯5<μ<x¯+5)=norm.dist(55,50,2,1)norm.dist(45,50,2,1)=0.9876
      For each probability, find the appropriate d>0
    • P(x¯d<μ<x¯+d)=0.68259 -> d=μnorm.inv(10.682592,μ,σ)=2
    • P(x¯d<μ<x¯+d)=0.9545 -> d=μnorm.inv(10.95452,μ,σ)=4
    • P(x¯d<μ<x¯+d)=0.99 -> d=μnorm.inv(10.992,μ,σ)=5.152
      We could use a trick - probability 0.68259 if one σ from the mean, so we could use norm.s.inv(μσ), then 0.9545 is two σ from the mean, etc

    Other expressions equivalent to x¯d<μ<x¯+d:

    • x¯(μd,μ+d)
    • |x¯μ|<d
    • d=# of stdevσx¯

Confidence Intervals

We do not know the value of the population parameter
We do not expect the value of our sample statistic to equal the population parameter
We want to use the sample statistic value to construct an interval that is likely to contain the population parameter
This likelyhood, called the confidence level CL can be understood as the success rate of our construction process
For means and proportions, the interval is centered at the value of the computed statistic from the collected random sample

To return (x¯d,x¯+d), we have to set the confidence interval.
Select confidence level (understood as probability that given the distance, the randomly selected sample will produce a confidence interval containing the parameter)

We introduce α as the failure rate (probability of value outside the CL)

We don't want to choose CL=100%, because that will mean that CI=(,+)
rM drawing 2026-04-07-21.13.19.png400

zα2=norm.s.inv(CL+α2)=1norm.s.inv(α2)

Determining the Margin of Error

The SDSM and SDSP are both approximately normal under certain conditions. These conditions will need to be met for our construction method

In order to calculate the area between two points in a normal distribution, knowing how many standard deviations each point is away from the mean is sufficient for calculating the area

This fact allows us to determine how many standard deviations away from the mean we need to go in order to achieve a given CL
The number of standard deviations away from the mean we need to go in order to achieve the CL is called the critical value

Sampling Distributions

SDSM - iid random sampling with size n
μx¯=μσx¯=σn


SDSP - iid random sampling size n - shape always scaled binomial
μp^=pσp^=pqn

Definitions and Interpretations

Condifence Level (CL)

Probability of selecting a random sample that wil produce a confidence interval containing the parameter
Success rate of construction process
Area between negative and positive critical values

For Means :(x¯ME,x¯+ME),ME=σnzα2For Proportions :(p^ME,p^+ME),ME=pqnzα2

Set confidence level - success rate of construction
Use CL to determine ME
Find # of stdev necessary to attain confidence (critical value)
Then, ME=zα2#of σ

The problem with proportions, is that goal of inferential statistics is to find the parameter p, and the formulas are using that value, which makes no sense. So, to get the best out of it, when we obtain our sample is we can use p^ and q^

ME=p^q^nzα2

This is why it's important to meet the normal distribution conditions, so normal distribution can be used instead of the binomial one.

Alpha Level α=1CL

Probability of selecting a random sample that will not produce a confidence interval containing the parameter
Failure rate of construction process
Total area in the tails (evenly split between left and right tails)

Critical Values from Standard Normal Distribution zα2

Number of standard deviations away from the mean we must go, in both directions, in order to achieve the confidence level

Confidence Intervals for Proportions

Requirement - SDSP approximately normal
Random sample of size n such that n^p>5 and n^q>5

Exercises

  1. A QC manager randomly selects 144 lights sensors each day of production. Company policy mandates manufacturing overhauls if the company is confident that the % of defective light sensors produces in a day is larger than 5% at 92% confidence level. Twelve of the randomly sampled sensors are found to be defective. Construct the confidence interval and make a recommendation to the manager
    CL=92%, α=8%
    the distribution satistfies normal distribution
    n=144, x=12
    np^=12>5 (we consider defective as success)
    nq^=132>5
    So we can construct the CI
    (p^zα2p^q^n,p^+zα2p^q^n)
    p^=12144=112, q^=1112
    zα2=norm.s.inv(0.96)=1.75068 we can use norm.s.inv instead of z-score, because σ=1, so it gives the same value
    Lower bound - LB=p^zα2σp^=1121.750681121112144=0.04301
    Upper bound - UB=p^+zα2σp^=112+1.750681121112144=0.12366
    so the CI - (4.301%,12.366%)
    At the 92% confidence level, the population proportion of defective sensors is between 4.301% and 12.366%
  2. The QC manager would like to update the daily sample size so that the margin of error is less than 2%. What would that sample size be?
    zα2p^q^n<0.02
    (zα2p^q^0.02)2<n
    We see that n takes the highest value when p^q^ is the largest.
    p^q^=p^(1p^)=p^p^2max. p^=12

p^=12 is the most conservative answer for the question 'when will the margin of error be the smallest?' It will be the solution to all of those questions. Always.

  1. You are running for president for SGA at FHSU. Campaign team randomly selects 100 students to see who they are voting for, and only 37 of them say are voting for you. Construct a 99% confidence interval. A simple majority is neede to win, should you be concerned?
    CL=99%, α=1%, n=100, x=37, p^=37100=0.37, q^=63100=0.63
    np^=1000.37=37>5 and nq^=1000.63=63>5 so it's a random natural
    zα2=norm.s.inv(CL+12zα2)=norm.s.inv(0.99+0.005=0.995)=2.57583
    σp^=0.04828
    LB=0.24574 and UB=0.49436

Three types of solutions

  1. A journal published $(0.11, 0.14) as 95% CI for the proportion of people who regularly attend the movie theater. What can you deduce about the sample data?
    ME=0.140.11=0.03
    p^=0.125, q^=0.875
    ME=zα2p^q^n=norm.s.inv(0.975)0.1250.875nn=1868

Confidence Intervals for Means

Important

ME=zα2σnCI=(x¯ME,x¯+ME)

distribution is normal, is n>30, or population is normal and random

Sampling Distribution of Sample Variances

Looking at the distribution of sample variances, they are skewed to the right

P(s2<μs^)>12

and knowing that the sample mean is equal to the population variance

P(s2<σ2)>12

CV=sn, but this is too small, so we make it bigger
We create a new distribution, called t-distribution, which gives us the usable critical value. =t.dist(x, d.f=n-1, 1) and =t.inv() in Excel

t.dist(x,d.f=n1,1)=P(t<x)area to the leftt.inv(area,d.f=n1)=xx that has area to the left

Mean of t.dist=0 is symmetric about 0 and bell-shaped
Pasted image 20260414130120.png405
The sides of the t-distribution are fatter compared to the z-distribution

tα2=t.inv(CL+α2,n1)
t-distribution should be used when working on means with σ unknown

Exercises

Many semi-trucks can haul up to 48000 pounds of cargo legally. A company says that their trailer can haul 35 cows. The consumer questions whether the trailer can actually do that. Assume the weights of cows is normally distributed. The consumer randomly samples 10 cows and finds average weight to be 1638 pounds, with σ=49lbs. Construct a 95% CI for the population mean weight of cows.

Not Knowing t-distribution

σ=49lbs
x¯=1368lbs
n=10
zα2=norm.s.inv(0.975)=1.95997
LB=x¯zα2σn=1337.63
UB=x¯+zα2σn=1398.37
so in the worst scenario, 35 cows will weight 351398.37lbs=48942.95lbs which is little over the advertised weight. For the trailer to haul 35 cows, the average weight of a cow would have to be 1371lbs.

Determine how many fully grown cows would need to be sampled so that the confidence interval would contain at most 1 whole number

2ME=Length<1ME<0.51.9599749n<0.5n=36894
Knowing t-distribution

x¯=1368
s=49
tαn,n1=t.inv(0.975,9)=2.26
LB=x¯tα2,n1sn=1332.95
UB=x¯+tα2,n1sn=1403.05
This interval is bigger, because we used the same stdev, but increased the critical value (due to usage of t-distribution)