Dimensionality reduction with Principal Component Analysis (PCA)

Dimensionality reduction

Classical principal component analysis

Introduction

Principal component analysis takes a dataset \(X\) with \(m\) variables and returns a principal component matrix \(A\) with size \(m\times k\).

Each new dimension is a linear function of the existing data. \(Z=XA\).

Each dimension in uncorrelated, and ordered, in order of descending explanation of variability.

The problem of principal component analysis is to find these weightings \(A\).

Classical PCA

We take the first \(k\) eigenvectors of the covariance matrix, ordered by eigenvalue.

Getting the eigenvectors using SVD

We can decompose \(X=U\Sigma A^T\).

We can take the eigenvectors from \(A\).

Choosing the number of dimension

We can choose \(k\) such that a certain percentage of the variance is retained.

Robust principal component analysis

Robust PCA

Robust PCA can be used to deal with corrupted data, such as corrupted image data.

Rather than data \(X\) we have \(M=L_0+S_0\) where \(L_0\) is what we want to recover (and is low rank), and \(S_0\) is noise (and sparce).

In video footage, \(L_0\) can correspond to the background, while \(S_0\) corresponds to movement.