A1.1 Data Distributions
A2 – Recursion and Financial Modelling
OA1 – Matrices
OA2 – Networks and Decision Mathematics
OA3 – Geometry and Measurement
1 of 2

1.1.9 Describing Numerical Distributions -Spread

Range/Spread

  • The range, of a dataset is the difference in value between the largest and smallest datapoint. We can use the formula:

\text { Spread }=x_{\text {largest }}-x_{\text {smallest }}

  • This is the most appropriate measure of spread when dealing with datasets which are asymmetrical or contain few outliers, though in some cases it may make sense to use the range to measure the spread of symmetric distributions depending on context.

Note: when dealing with 2-dimensional data (i.e. data with an x and y value); range refers to the difference between the largest and smallest y values, while domain refers to the difference between the largest and smallest x values. When dealing with 1-dimensional data, the terms can be used interchangeably.

Example

For the dataset: 3 5 6 10

The spread is: 10-3=7

Standard Deviation/Spread

  • This is generally the most appropriate measure of spread for symmetric distributions, or distributions containing outliers.
  • Standard deviation is represented by either a lower-case s, or the Greek symbol sigma; σ.
  • The formula for standard deviation is:

s=\sqrt{\frac{\sum_{i=1}^{N}\left(x_{i}-\bar{x}\right)^{2}}{N}}

Where N is the total number of data points, is the mean of the dataset, and is the value of the i’th datapoint.

Example

For the dataset: 1 3 4 4

The number of data points: N=4

The mean of the dataset is: \bar{x}=\frac{1+3+4+4}{4}=3

The standard deviation is: s=\sqrt{\frac{(1-3)^{2}+(3-3)^{2}+(4-3)^{2}+(4-3)^{2}}{4}}=\sqrt{\frac{3}{2}}=1.22 (rounded to 2 dps)

Choosing a Measure for the Spread: Range, Standard Deviation or IQR

  • The range, standard deviation and interquartile range are all accepted measures for the spread of a distribution. There is no precise criteria for choosing the most appropriate measure, however there are some aspects to consider.
  • Both the range and the standard deviation are affected by extreme values and shape, while the IQR is not.
  • The standard deviation is the most relevant for normal distributions.
  • In general, the range is the least comprehensive statistic of the three measures.

Example

Picture 2

The above histogram contains an extreme value. Of the three measures of spread, only IQR is not affected by extreme values, therefore the IQR is the most appropriate measure of spread.