The central limit theorem and the gaussian/normal distribution

Central limit theorem

Central limit theorem

Generalise weak law of large numbers

Characteristic function of summed IID events

\(Z=\sum_{i=1}^nY_i\)

\(\phi_Z(t)=E[e^{itZ}]\)

\(\phi_Z(t)=E[e^{it\sum_{i=1}^nY_i}]\)

\(\phi_Z(t)=E[e^{itY}]^n\)

\(\phi_Z(t)=\phi_Y(t)^n\)

Taylor series: first moments dominate with means

\(Z=\sum_{i=1}^nY_i\)

\(Y=\dfrac{X}{n}\)

\(\phi_Z(t)=\phi_Y(t)^n\)

\(\phi_Z(t)=\phi_{\dfrac{X}{n}}(t)^n\)

\(\phi_Z(t)=\phi_X(\dfrac{t}{n})^n\)

\(\phi_X(t)=1+it\mu_X -\dfrac{(\mu_X +\sigma_X^2 )t^2}{2} +\sum_{j=3}^{\infty }\dfrac{E[X^j](it)^j}{j!}\)

\(\phi_X(\dfrac{t}{n})=1+i\dfrac{t\mu_X }{n}-\dfrac{(\mu_X +\sigma_X^2 )(\dfrac{t}{n})^2}{2} +\sum_{j=3}^{\infty }\dfrac{E[X^j](i\dfrac{t}{n})^j}{j!}\)

\(\phi_X(\dfrac{t}{n})=1+i\dfrac{t\mu_X }{n}-\dfrac{(\mu_X +\sigma_X^2 )t^2}{2n^2} +\sum_{j=3}^{\infty }\dfrac{E[X^j](i\dfrac{t}{n})^j}{j!}\)

Eliminating the imaginary term

We want \(\mu\) to be \(0\).

\(Z=\sum_{i=1}^nY_i\)

\(Y=\dfrac{X-\mu_X }{n}\)

\(\phi_Y(t)=1+it\mu_Y -\dfrac{(\mu_Y +\sigma_Y^2 )t^2}{2} +\sum_{j=3}^{\infty }\dfrac{E[Y^j](it)^j}{j!}\)

\(\mu_Y =E[\dfrac{X-\mu_X }{n}] ={\mu_X -\mu_X }{n}=0\)

\(\phi_Y(t)=1-\dfrac{\sigma_Y^2t^2}{2} +\sum_{j=3}^{\infty }\dfrac{E[Y^j](it)^j}{j!}\)

\(\sigma^2_Y =E[(\dfrac{X-\mu_X }{n})^2]\)

\(\sigma^2_Y =E[\dfrac{X^2+\mu^2_X-2X\mu_X }{n^2}]\)

\(\sigma^2_Y =\dfrac{E[X^2]+E[\mu^2_X]-E[2X\mu_X] }{n^2}]\) \(\sigma^2_Y =\dfrac{E[X^2]-\mu^2_X}{n^2}]\)

\(\sigma^2_Y =\dfrac{\sigma^2_X}{n^2}\)

\(\phi_Y(t)=1-\dfrac{\sigma_X^2t^2}{2n^2} +\sum_{j=3}^{\infty }\dfrac{E[(\dfrac{X-\mu}{n})^j](it)^j}{j!}\)

\(\phi_Z(t)=\phi_Y(t)^n\)

\(\phi_Z(t)=[1-\dfrac{\sigma_X^2t^2}{2n^2} +\sum_{j=3}^{\infty }\dfrac{E[(\dfrac{X-\mu}{n})^j](it)^j}{j!}]^n\)

\(\phi_Z(t)=[1-\dfrac{\sigma_X^2t^2}{2n^2}]^n\)

Eliminating \(\sigma^2\)

\(Z=\sum_{i=1}^nY_i\)

\(Y=\dfrac{X-\mu_X }{\sigma n}\)

\(\phi_Y(t)=1+it\mu_Y -\dfrac{(\mu_Y +\sigma_Y^2 )t^2}{2} +\sum_{j=3}^{\infty }\dfrac{E[Y^j](it)^j}{j!}\)

\(\mu_Y =E[\dfrac{X-\mu_X }{\sigma_X n}] ={\mu_X -\mu_X }{\sigma_X n}=0\)

\(\phi_Y(t)=1-\dfrac{\sigma_Y^2t^2}{2} +\sum_{j=3}^{\infty }\dfrac{E[Y^j](it)^j}{j!}\)

\(\sigma^2_Y =E[(\dfrac{X-\mu_X }{\sigma n})^2]\)

\(\sigma^2_Y =E[\dfrac{X^2+\mu^2_X-2X\mu_X }{\sigma^2 n^2}]\)

\(\sigma^2_Y =\dfrac{E[X^2]+\mu^2_X-2E[X]\mu_X }{\sigma^2 n^2}\)

\(\sigma^2_Y =\dfrac{E[X^2]-\mu^2_X}{\sigma^2 n^2}\)

\(\sigma^2_Y =\dfrac{\sigma^2_X}{\sigma^2 n^2}\)

\(\sigma^2_Y =\dfrac{1}{n^2}\)

\(\phi_Y(t)=1-\dfrac{t^2}{2n^2} +\sum_{j=3}^{\infty }\dfrac{E[(\dfrac{X-\mu}{\sigma n})^j](it)^j}{j!}\)

\(\phi_Z(t)=\phi_Y(t)^n\)

\(\phi_Z(t)=[1-\dfrac{t^2}{2n^2} +\sum_{j=3}^{\infty }\dfrac{E[(\dfrac{X-\mu}{\sigma n})^j](it)^j}{j!}]^n\)

\(\phi_Z(t)=[1-\dfrac{t^2}{2n^2}]^n\)

Preparing for exponential expansion

We know that

\([1+\dfrac{x}{n}]^n=e^x\)

As \(n \rightarrow \infty\).

With:

\(Z=\sum_{i=1}^nY_i\)

\(Y=\dfrac{X-\mu_X }{\sigma n}\)

We have:

\(\phi_Z(t)=[1-\dfrac{t^2}{2n^2}]^n\)

With:

\(Z=\sum_{i=1}^nY_i\)

\(Y=\dfrac{X-\mu_X }{\sigma \sqrt n}\)

We have:

\(\phi_Z(t)=[1-\dfrac{t^2}{2n}]^n\)

Which tends towards

\(\phi_Z(t)=e^{-\dfrac{1}{2}t^2}\)

Rescaling

The average of random variables, less their mean, and divided by their standard deviation multiplied by the square root of the sample size, follows a normal distribution as \(n\) increases.

What does this say about the actual distribution of sample averages?

\(Z=\sum_{i=1}^nY_i\)

\(Y_i=\dfrac{X_i-\mu_X }{\sigma_X \sqrt n}\)

\(\sum_{i=1}^nY_i\)

\(Y=\dfrac{X}{n}\)

Let’s create \(Q\).

\(Q=\dfrac{Z\sigma_X }{\sqrt n}+\mu_X\)

\(Q=\dfrac{(\sum_{i=1}^nY_i)\sigma_X }{\sqrt n}+\mu_X\)

\(Q=\dfrac{(\sum_{i=1}^n(\dfrac{X_i-\mu_X }{\sigma_X \sqrt n}))\sigma_X }{\sqrt n}+\mu_X\)

\(Q=\sum_{i=1}^n(\dfrac{X_i-\mu_X }{n})+\mu_X\)

\(Q=\sum_{i=1}^n(\dfrac{X_i-\mu_X }{n}+\dfrac{\mu_X}{n})\)

\(Q=\sum_{i=1}^n(\dfrac{X_i}{n})\)

This is the sample average.

\(\phi_Q(t)=\phi_{\dfrac{Z\sigma_X }{\sqrt n}+\mu_X}(t)\)

\(\phi_Q(t)=\phi_Z(\dfrac{t\sigma_X }{\sqrt n})e^{it\mu_X}\)

\(\phi_Z(\dfrac{t\sigma_X }{\sqrt n})=e^{-\dfrac{1}{2}(\dfrac{t\sigma_X }{\sqrt n})^2}\)

\(\phi_Z(\dfrac{t\sigma_X }{\sqrt n})=e^{-\dfrac{1}{2}\dfrac{t^2\sigma^2_X }{n}}\)

\(\phi_Q(t)=e^{-\dfrac{1}{2}\dfrac{t^2\sigma^2_X }{n}}e^{it\mu_X}\)

Normal distribution

We name the normal distribution this function when \(n=1\)

\(N(\mu_X, \sigma^2_X)=e^{-\dfrac{1}{2}\dfrac{t^2\sigma^2_X }{n}}e^{it\mu_X}\)

\(N(\mu_X, \sigma^2_X)=e^{-\dfrac{1}{2}t^2\sigma^2_X }e^{it\mu_X}\)

Getting the probability distribution function

\(\phi_X(t)=e^{-\dfrac{1}{2}t^2\sigma^2_X} e^{it\mu_X}\)

\(\phi_X(t)=e^{-\dfrac{1}{2}t^2\sigma^2_X}[\cos (t\mu_X )+i\sin (t\mu_X)]\)

Convergence

Convergence in distribution (converge weakly)

Convergence in probability and o-notation

Introduction

Converges in probability

\(P(distance(X_n, X)>\epsilon )\rightarrow 0\)

For all \(\epsilon\).

\(X_n \rightarrow^P X\)

Little o notation

Little o notation is used to describe convergence in probability.

\(X_n=o_p(a_n)\)

mean that

\(\dfrac{X_n}{a_n}\)

Converges to \(0\) and \(n\) approaches something

Can be wrtiten:

\(\dfrac{X_n}{a_n}=o_p(1)\)

Big O notation

Big O notation is used to describe boundedness.

\(X_n=O_p(a_n)\)

means that:

If something is little o, it is big O.

Almost sure convergence

\(X_n\) converges almost surely to \(X\) if:

\(d(X_n, X)\rightarrow 0\)

Where \(d(X_n, X)\) is a distance metric.

\(X_n\rightarrow^{as} X\)

Gaussian distributions

Gaussian

\(f_x=\dfrac{1}{\sqrt {2\pi \sigma^2 }} e^{-\dfrac{(x-\mu)^2}{2\sigma }}\)

The error function and the complementary error function

Multivariable Gaussian distribution

Definition

For univariate:

\(x \sim N(\mu, \sigma^2 )\)

We define the multivariate gaussian distribution as the distribution where any linear combination of components are gaussian.

For multivariate:

\(X \sim N(\mu, \Sigma )\)

Where \(\mu\) is now a vector, and \(\Sigma\) is the covariance matrix.

Density function is :

\(f_x=\dfrac{1}{\sqrt {(2\pi )^n|\Sigma |}} e^{-\dfrac{1}{2}(x-\mu )^T\Sigma^{-1}(x-\mu)}\)

For normal gaussian it is:

\(f_x=\dfrac{1}{\sqrt {2\pi |\sigma^2}} e^{-\dfrac{1}{2\sigma^2}(x-\mu )^2)}\)

This is the same wher \(n=1\).

Singular Gaussians

Need det \(|\Sigma |\) and \(\Sigma^{-1}\). These rely on the covariance matrix not being degenerate.

If the covariance matrix is degenerate we can instead use the pseudo inverse, and the pseudo determinant.