7 - Hypothesis Testing

Hypothesis Testing

Introduction

Definitions

Hypothesis - a claim about reality
Competing Hypothesis - a pair of opposing hypotheses regarding a particular aspect, such that one of them must be true
Null Hypothesis H0 - Hypothesis that is acted upon as if true by default
Alternative Hypothesis H1 - Hypothesis not presumed to be true in our actions

Conclusions

If there is enough evidence against the H0 then it is rejected
We reject the null hypothesis in favour of the alternative one
This does NOT prove H0 false. This does NOT prove H1 true
There is sufficient evidence to begin acting in accord with the alternative hypothesis
If there is not enough evidence against the H0, then it is not rejected
We fail to reject the null hypothesis
This does NOT prove H0 true, nor H1 false
There is not a strong enough reaso to alter our default position in determining how to act away from H0

Practical judgment is necessary when determining how to act in accord with a hypothesis and the evidence

Consequences of Hypothesis Testing and Assessing Evidence

Errors

Type I Error - Rejecting a true null hypothesis
Type II Error - Failing to reject a false null hypothesis

Determining Sufficient Evidence

Given a true null hypothesis

Assessing Evidence

The p-value is the probability of obtaining evidence at least as rare as the evidence collected under the assumption that the null hypothesis is true
If the p-value is very small, it indicates that something rare or unusual happened in the case that the null hypothesis is true
If the p-value is not very small, it indicates that something routine or usual happened in the case that the null hypothesis is true
If the p-value is small enough, it makes us question whether assuming the null hypothesis is true is the right way to act
For single-tail tests, there are infinitely many parameter values in the null hypothesis. The border value is chosen for the calculation as it provides the worst-case probability value

Determining Sufficiency of Evidence

We set the standard for sufficient evidence by determining which p-values are small enough to reject the null hypothesis
If the p-value is less than the value we set, we reject the null hypothesis
We call this value the α-value or significance level of the test

Hypothesis Testing Procesure

  1. Identify competing hypotheses
  2. Set significance level α
  3. Determine sample size n
    p-value distribution is a probability calculating from the sampling distribution. Select sample size large enough so that the shape of the sampling distribution can be approximated
  4. Select a sample via SRS
  5. Collect and study the sample. Compute sample statistics
  6. Compute the p-value and make statistical conclusion
    p-value <α - reject H0 in favor of H1
    p-value α - fail to reject H0
  7. In light of statistics conclusion, determine if a change in behaviour is prudent and practical

Claims on Means

Hypothesis Tests on Means

{H0:μμ0H1:μ>μ0{H0:μμ0H1:μ<μ0{H0:μ=μ0H1:μμ0

Each member of the population has a numerical value associated with it. We are interested in the average of these values
Need n>30 (SDSM normal) or population normally distributed
Need a SRS

Computing the p-value

The sample statistic that we compute from our collected sample is our evidence. The value reisdes along the horizontal axis of the sampling distribution
p-value is computed by computing probability that at least as extreme as our test statistic is achieved under the assumption that the null hypothesis is true in the worst case scenario (equal to border value)
If we know the standard deviation of the sampling distribution, we can calculate the p-value easily
If we do not know the standard deviation of the sampling distribution, we must transform our problem
Many people like to transform the problem regardless of whether we know the standard deviation of the sampling distribution

The sampling distribution is approximately normal
If σ is known, we transform the problem using z-score, transforming sample evidence into a test statistic z

z=x¯μ0σn$$whichresidesinthezdistributionIf$σ$isunknown,wetransformtheproblemusingthetscore,transformingoursampleevidenceintoateststatistict$$t=x¯μ0sn

which resides in the t-distribution

Excercises

Hypothesis Test
A software designer is testing the efficiency of the new version of program. He hopes that new version is faster than the last one. Both versions are run 14 times (different computers, same set of websites). Test developer's claim at the 0.01 level of significance assuming that the differences come from a normal distribution.

Old New
12.4 10.2
13 7.8
14.5 9.3
13.2 14
10 7
16 16
14.3 16.6
9.3 5.6
14.2 15.7
9.3 7.1
11.5 7.8
13.6 6.1
9.2 4
12.4 13.2
Hypothesis tests

For the program to be faster:

H0:μd0H1:μd>0

We calculate all differences as: oldnew. If the average is positive, then the program is faster
α=0.01
n=14
approx, random, norm
t-test
x¯d=2.32
sd=2.997

t=x¯dμ0sdn=2.3202.99714=2.898

this t value is is number of deviations we have to move away from 0

We put μ0=0, because our question is 'is faster than', and we operate on differences, so anything >0 means it's faster.

p-value: p=1t.dist(2.898,13,1)=0.006
p-value<α - so we reject the null
Pasted image 20260428122229.png
green area is the p-value, which is the area for which tit so the H0 fails for (program being slower), so we reject it and accept H1 - program being faster
The area on the left of 2.898 border is probability that the program is faster than the old version

Claims on Proportions

Hypothesis Tests on Proportions

{H0:pp0H1:p>p0{H0:pp0H1:p<p0{H0:p=p0H1:pp0

We are interested in the percentage of members with a particular quality
We need np0>5 and nq0>5 (SDSP normal) and SRS

Z-Distribution

z=p^p0p0q0n$$whichresidesinthezdistribution

Exercises

Harper's Index reported that 80% of all supermarket prices and in the digits of 9 or 5. From your recollection, this estimate seems too high. You randomly select 115 items from supermarket price catalogs and find that 88 have prices that end with a 9 or 5. Test the claim at the 1% level of significance.

H0:p80%H1:p<80%

α=0.01
random
n=115
p0=0.8
np0>5 and nq>5 so normal
88% have 5 or 9 at the end, so p^=881150.75
Doing distribution:
Pasted image 20260428123838.png
μp^=0.8
σp^=0.80.2115=0.037
p-value (green area)=norm.dist(88115,0.8,0.037,1)=0.176

p-value is bigger than our α, so we fail to reject the hypothesis H0


Myers-Briggs estimates that about 82% of college student government leaders are extroverted. A random sample of 73 student government leaders were given the Myers-Briggs personality test; 68 of them were found to be extroverted. Test the claim that the actual percentage is different that what was reported at a significance level of 2%

H0:p=0.82H1:p0.82

α=0.02
n=73, p0=0.82
it is random
np0>5 and nq0>5 so it is normal
p^=6873=0.93
Doing test statistic:
z=p^pσp^=p^p0p0q0n=0.930.820.820.1872=2.48
Pasted image 20260428125153.png337
p-value = norm.s.dist(2.48,1)2=0.013
p-value<α so we can reject the null


According to the Pew Research Center, the average number of FB friends is 338. A random sample of 100 users had an average of 287 friends with a standard deviation of 50 friends. Test the claim that the average number of friends is less than reported by Pew using a significance level of your choosing.

H0:μ338H1:μ<338

α=0.05, n=100, x¯=287, s=50

A delivery service claims to deliver packages from NYC to LA in 24 hours on average. It is well-agreed upon the delivery services have a standard deviation of 2 hours. Several complaints about this particular service have been made about delivery taking longer than advertised. An independent consumer agency conducted a study. A random sample of 35 packages produces a mean of 24.85 hours. Test the claim at α=0.04

H0:μ=24.0H1:μ24.0

n=35, x¯=24.85, α=0.04
z=x¯μσn=24.8524235=2.51
p-value=2norm.s.dist(2.51,1)=0.012
p-value<α so we reject the H0