# Summary statistics and visualisation for multiple variables

## Statistics for two variables

### Sample covariance

We previously defined the population covariance as \(\sigma_{XY}=E[(X-\mu_X)^T(Y-\mu_Y)]\).

We define the sample covariance as \(\sigma_{XY}=\dfrac{1}{n}\sum_i(x_i-\bar x)(y_i-\bar y)\).

We can calculate this using matrices:

\(M=X-\bar x\)

\(N=Y-\bar y\)

\(\sigma_{XY}=\dfrac{1}{n}M^TN\).

### Sample correlation

\(\rho_{XY}=\dfrac{\sigma_{XY}}{\sigma_X \sigma_Y}\)

### Covariance matrix

If we have \(n\) variables we can have a \(n\times n \) matrix \(\Sigma \) where:

\(\Sigma_{ij} = \sigma_{ij}=\dfrac{1}{n}(X_i-\bar x_i)^T(X_j-\bar x_j)\)

### Centred covariance

If \(\bar x = \bar y = 0\) then:

\(\sigma_{XY}=\dfrac{1}{n}X^TY\)

### Correlation matrix

Here each entry is the correlation rather than the covariance.

## Correlation coefficients

### Pearson correlation coefficient

The Pearson correlation coefficient is definited as the covariance normalised by the individual variances.

It is between \(-1\) (total negative linear correlation), \(0\) (no linear correlation) and \(1\) (total negative linear correlation).

\(p_{X,Y}=\dfrac{cov (X,Y)}{\sigma_X\sigma_Y}\)

### Spearman rank correlation

For each of \(2\) variables we create a ranking of them.

From \(X\) and \(Y\) we then have \(R_X\) and \(R_Y\).

We then calculate the Pearson correlation coefficient between the rankings.

\(r_S=\dfrac{cov(R_X, R_Y)}{\sigma_{R_X}\sigma_{R_Y}}\)

### Kendall rank correlation

Concordant and discordant pairs

\(\tau = \dfrac{n_{concordant}-n_{discordant}}{\begin{pmatrix}n\\2\end{pmatrix}}\)

### General correlation coefficient

## Updating statistics

### Updating the covariance

If it is centred:

\(\sigma_{XY}^n=\dfrac{1}{n}X_n^TY_n\)

So:

\(\sigma_{XY}^{n+1}=\dfrac{n\sigma^n_{XY}+x_{n+1}^ty_{n+1}}{n+1}\)

## Visualising multiple continuous variables

### Time series

### Scatter plots (with size as variable)

### Q-Q plots

Plot quartiles of variables against each other.

## Visualising a single class variable

### Bar and column charts

### Pie charts

## Visualising multiple class variables

### Stacked bar and column charts

## Visualising class and continous variables

### Multiple box and whiskers

### Scatter plots with colour

## Visualising geographic data

## Visualising time series

### Heat maps

### Sparklines