2.6 - Measures of dispersion
Measures of Dispersion
Dispersion refers to how varied data are in a data set or how spread out the distribution of the data is
There are many ways to measure 'dispersion'
Range
Most straighforward measure to calculate
Because we are subtracting values, the data must be quantitive (interval or ratio)
Interquartile Range (IQR)
IQR is the range of the middle
equal to the length of the box in the box plot
Deviation
Previous measures took only some of the values into the account (min and max for range,
Other way to include all data is to compare how far away each piece of data
The most logical and common value of
After calculating all of the deviations, we usually take the average
Mean Absolute Deviation (MAD)
After understanding the Example, we conclude that the average deviation from the mean will always be zero. But the set doesn't have zero spread, does it?
To fix that, we usually use mean absolute deviation (MAD). It calculated the deviation based on the average distance of values from the central value.
The MAD is minimized, if we take the median for value
Difference between deviation and absolute deviation is the difference between displacement and distance
Example
Data set
mean -
deviations:
average deviation =
absolute deviation =
We can also calculate deviation from the median
median -
deviations:
average deviation =
average deviation =
Variance
Distance between to points on a plane is equal to
And the distance between central value
It is a sum of deviations from
When considering MAD, the median minimized the sum of the absolute deviations. This is not the case for squared deviations, where the mean gives the smallest value
For this reason, we define variance of a population as the average of squared deviations from the mean
Variance in large population
In large populations, we use sample statistics.
When calculating the average of squared deviations from the mean using sample data, the computation tends to underestimate the variance of the larger population significantly
Because of this, the sample variance formula is adjusted slightly to account for that difference
Standard deviation
The variance is a powerful measure, but it's hard to comprehend because of the squared units. Because of that, we introduce standard deviation which is a square root of the variance.
End remark
Fact worth to note - all of the measures discussed (Range, IQR, MAD, Variance, Standard Deviation) are always non-negative.
Additionally, they will be equal to 0 for constant data sets
Visualization of mean and standard deviation
For data set
What should we do to the set, if we want to keep the mean constant, but decrease the standard deviation?
We move the values from both sides of the mean by the same amount closer to the mean