2.7 - Connecting measures together

Connecting Measures of Central Tendency & Measures of Dispersion

Previous sections introduced two classes of measures in descriptive statistics:

Example from 2.6

Visualization of mean and standard deviation

For data set
{4,7,13,21,28,33,42,61}
μ=26.125, and σ=17.93
What should we do to the set, if we want to keep the mean constant, but decrease the standard deviation?
We move the values from both sides of the mean by the same amount closer to the mean
{4+3,7+5,13,21,28,33,425,613}
μ=26.125, and σ=15.6 so the goal was met

Now, let's as a question - What % of observations fall within at most 1 standard deviation from the mean?

μkσ

For 1 stdev:
μ1σ=26.12515.6=10.525
μ+1σ=26.125+15.6=41.725
So the values falling in [μσ,μ+σ] are {13,21,28,33}

Cherbyshev's Inequality

Cherbyshev's Inequality definition

Given any data set with mean μ, and standard deviation σ, and any real number k>1, the proportion of observations that lie in the inveral [μkσ,μ+kσ] is at least 11k2

Using that, we can guarantee a minimum percentage of observations falling in an interal symmetric about the mean
If k=2, we have at least 11s2=34=75% of the observations fall between μ2σ and μ+2σ

Half of all observations fall for k=2, as 1122=112=12

Normal distribution and Curve fitting

Cherbyshev's Inequality applies to all distributions, but more precise connections can be made when we restrict our interest to particular classes of distributions
To express the shape of a normal distribution, we must discuss modelling distributions with functions or curves. Consider the following
Pasted image 20260203220834.png
The curve highlights the shape of histogram. Increasing the number of classes would make the curve be better suited, but might not be possible with discrete variables and finite data sets.
The bar height represents the percentage of observations in that class
To easily read the value of function, we fit a curve that closely resembles common classes.

For bell curve:

Inflection point of standard bell curve is exactly at one σ away from μ

Empirical Rule for Normal Distributions

Pasted image 20260205124344.png

Normal distributed values in the world

  • IQ
  • Height
  • Class scores

Implication of the Empirical Rule

Suppose the IQ distribution has mean of 100 and stdev of 15

Determine % of adults with IQ between 70 and 130
70 is μ2σ and 130 is μ+2σ, so its 95%
Determine % of adults with IQ greater than 100
100 is the mean, so 50%
Determine % of adults with IQ less than 55
5510015=3 so its 3σ below the mean, which means 0.15%
Determine % of adults with IQ between 85 and 145
<85 is 1 deviation, >145 is 3 above
1 below is 34%, 3 above is 49.85% which in total is 83.85%
Determine % of adults with IQ between 95 and 105
9510015=13=10510015, so its 13σ away from mean
we cannot calculate that right now, but we'll be able to in ch.4

z-score

z-score of an observation is how many standard deviations away from the mean it is
z>0 - observation is greater than the mean
z<0 - observation is less than the mean

z=xμσ
Examples

z=0 - equal to the mean
z<2 - below the 2nd deviation

Student got 76% on an exam. Class average was 71% and stdev 2.5%
z=xμσ=76712.5=52.5=2
Student got 85% on an exam. Class average was 80% and stdev 3%
z=xμσ=85803=53

We cannot compare the scores against the other students grades, but we can say that the first exam was taken much better than the first, against the averages of the class

Classifying Observations Comparatively

Different errors may occur, and some are more egregious than others.
Various protocols can find outliers - values far away from the majority of data values
Unusual observation value is top-most or bottom-most percentiles of the data set

These characterizations don't provide standard for determining classification, just a distinction between ideas

Important

Just because an observation is an outlier, doesn't mean it is an error

Definitions
Outlier

An observation that lies beyond the box of the box plot by more than 1.5 times the IQR
% copy from rM %

Unusual definition (2 definitions)

  • Observation is classified as unusual by 2σ rule, if it lies farther away from the mean the 2 stdev - |z|>2
  • Observation is classified as unusual by 3σ rule, if it lies farther away from the mean the 3 stdev - |z|>3