2.8, 2.9 - Measuring grouped data

Measures of Median and Mean on Grouped Data

2.7 - Connecting measures together focused on calculating the descriptive statistics on raw data, where we see every single sample. Now, we'll focus on calculating the descriptive statistics on grouped data.

Mean and median of data in a Frequency Table

Student Score Frequency
3 1
4 1
5 3
6 5
7 5
8 7
9 5
10 3

We could recreate the original data set {3,4,5,5,5,} but for huge sets it might create mistakes and take lots of time
Sample mean

x¯=xin=(xifi)fi

Population mean

μ=xiN=(xifi)fi

Mean and median of data in a relative frequency table

Sample mean

x¯=xin=xifin=fin=P(xi)(xiP(xi))

Population mean

μ=(xiP(xi))

Weighted measures

Exactly the same as relative frequency, where the weight takes place of relative frequency

Central measures on grouped data with loss of information

Previously, we only worked on data where we had exact values known to us. Now, we'll focus on situation where we are given intervals of data

Age Intervals Frequency Relative Frequency
[20, 29) 1441 144125466=5.66%
[30, 39) 2477 9.73%
[40, 49) 4971 19.52%
[50, 59) 7438 29.21%
[60, 69) 6367 25.00%
[70, 79) 2314 9.09%
80+ 458 1.80%
TOTAL 25466 100%

We cannot estimate exact mean or median of the values, because we do not know what are the exact values in the age intervals

Median

We find in which interval does the 50th percentile land in, and then we take the midpoint mi of that interval. It is a drastic measure, as we can miss the median by quite some, but it's the best we can do in this situation

Mean

To calculate the mean, we can use the midpoint mi of each interval, multiply it by the relative frequency, and take the total sum of those products.
Sample mean

x¯(miP(mi))fi=(miP(mi))

Product mean

μ(miP(mi))

Variance and Standard Deviation on Grouped Data

Measures of spread

When taking the sample of a population, we find that usually the calculated variance and standard deviation are a substancially low compared to the actual population values, when using traditional formulas. For this reason, we introduce a slight variation for sampled calculations.
For grouped data, we again take the midpoint of the intervals to calculate the measures

Variance for sample data
s2=[(mix¯)2fi](fi)1
Standard deviation for sample data
s=[(mix¯)2fi](fi)1=s2

Those can also be derived for relative frequency tables, but only for populations, because for samples, the addition of 1 in the denominator makes it impossible to rewrite the formula with the relative frequency