Summary statistics and visualisation for one variable

Basis statistics for a single variable

N

The is the size of the sample.

Sample range

Minimum

This is the smallest value in the sample.

Maximum

This is the largest value in the sample.

Range

This is the difference between the maximum and minimum.

Median

This is the value whereby 50% of the sample can be found below the value.

Percentiles

The \(x\)th percentile is the value by which \(x\%\) of the values can be found below it.

Interquartile range

This is the differnence between the \(25\)th percentile and the \(75\)th percentile.

Sample mode

The is the most common value in the sample.

Sample moments

Sample mean

We previously defined the population mean is defined as \(\mu=E[X]\).

The sample mean is defined as \(\bar x = \dfrac{1}{n}\sum_i x_i\).

Centred mean

We can subtract the mean from each entry in the sample. This will leave a new mean of \(0\). This is convenient for many calculations.

Sample variance

We previously defined the population variance as \(\sigma^2=E[(X-\mu)^2]\).

We define the sample variance as \(\sigma^2=\dfrac{1}{n}\sum_i(x_i-\bar x)^2\).

We can calculate this using matrices:

\(M=X-\bar x\)

\(\sigma^2=\dfrac{1}{n}M^TM\).

Centred variance

If \(\bar x =0\) then:

\(\sigma^2=\dfrac{1}{n}X^TX\).

Other

Standard error

Standard deviation

Sample size

Updating statistics

Updating the mean

\(\bar x_{n+1} = \dfrac{n\bar x_n+x_{n+1}}{n+1}\)

Updating the variance

If it is centred:

\(\sigma_n^2=\dfrac{1}{n}X_n^TX_n\)

So:

\(\sigma_{n+1}^2=\dfrac{n\sigma_n^2 +x_{n+1}^tx_{n+1}}{n+1}\)

Visualising a single continous variable

Box and whisker plots

Density plot