Parametric distributions


Degenerate distribution

Discrete uniform distribution

There is a set \(s\) such that:

\(P(x\in s)=p\)

\(P(x\not\in s)=0\)

Moments of the uniform distribution

The mean is the mean of the set \(s\).

If the set is all numbers of the real line between two values, \(a\) and \(b\), then:

The mean is \(\dfrac{1}{2}(a+b)\).

The variance is \(\dfrac{(b-a)^2}{12}\) in the continuous case.

The variance is \(\dfrac{(b-a+1)^2-1}{12}\) in the discrete case.

Bernoulli distribution

The outcome of a Bernoulli trial is either \(0\) or \(1\). We can describe it as:



With a single parameter \(p\).

Moments of the Bernoulli distribution

The mean of a Bernoulli trial is \(E[X]=(1-p)(0)+(p)(1)=p\).

The variance of a Bernoulli trial is \(E[(X-\mu)^2]=(1-p)(0-\mu)^2+(p)(1-\mu)^2=(1-p)p^2+p(1-p)^2]=p(1-p)\).

Binomial distribution

If we repeat a Bernoulli trials with the same parameter and sum the results, we have the binomial distribution.

We therefore have two parameters, \(p\) and \(n\).

\(P(X=x)={n\choose x }p^x(1-p)^{n-x}\)

Moments of the binomial distribution

The mean is \(np\), which can be seen as the trials are independent.

Similarly, the variances can be addeded together giving \(np(1-p)\).

Poisson distribution


We can use the Poisson distribution to model the number of indepedent events that occur in an a time period.

For a very short time period the chance of us observing an event is a Bernoulli trial.



Chance of no observations

Let’s consider the chance of repeatedly getting \(0\): \(P(0;t)\).

We can see that: \(P(0;t+\delta t)=P(0;t)(1-p)\).

And therefore:

\(P(0;t+\delta t)-P(0;t)=-pP(0;t))\)

By setting \(p=\lambda \delta t\):

\(\dfrac{P(0;t+\delta t)-P(0;t)}{\delta t}=-\lambda P(0;t))\)

\(\dfrac{\delta P(0;t)}{\delta t}=-\lambda P(0;t)\)

\(P(0;t)=Ce^{-\lambda t}\)

If \(t=0\) then \(P(0;t)=0\) and so \(C=1\).

\(P(0;t)=e^{-\lambda t}\)

Deriving the Poisson distribution

Multinomial distribution

Binomial recap

The mass function for the binomial case is:


The multinomial distribution

This generalises the binomial distribution where there are more than \(2\) outcomes.

\(f(x_1,...,x_n)=\dfrac{n!}{\prod_i x_i!}\prod_i p_i^{x_i}\)

Continous distributions

Exponential distribution

Weibull distribution

Power law

\(P(X)=\dfrac{\alpha -1}{a}(\dfrac{x}{a})^{-\alpha }\)

Where \(a\) is the lower bound.

\(P(X)=0\) for \(X<a\).

Moments of the power law

\(E[X^m]=\dfrac{\alpha - 1}{\alpha -1 -m }a\)

If \(m\ge \alpha -1 \) then this is not well defined.

Higher order moments, such that the variance, cannot be identified.

Logistic distribution

The logistic distribution has the cumulative distribution function:

\(F(x)=\dfrac{1}{1+e^{-\dfrac{x-\mu }{s}}} \)

Lévy distribution


The Lévy distribution is a continuous probability distribution.

The marginal probability is:

\(P(X)=\sqrt {\dfrac{c}{2\pi }}\dfrac{e^{-\dfrac{c}{2(x-\mu )}}}{(x-\mu )^{\dfrac{3}{2}}}\)

Gaussian distributions


\(f_x=\dfrac{1}{\sqrt {2\pi \sigma^2 }} e^{-\dfrac{(x-\mu)^2}{2\sigma }}\)

The error function and the complementary error function

Multivariable Gaussian distribution


For univariate:

\(x \sim N(\mu, \sigma^2 )\)

We define the multivariate gaussian distribution as the distribution where any linear combination of components are gaussian.

For multivariate:

\(X \sim N(\mu, \Sigma )\)

Where \(\mu \) is now a vector, and \(\Sigma \) is the covariance matrix.

Density function is :

\(f_x=\dfrac{1}{\sqrt {(2\pi )^n|\Sigma |}} e^{-\dfrac{1}{2}(x-\mu )^T\Sigma^{-1}(x-\mu)}\)

For normal gaussian it is:

\(f_x=\dfrac{1}{\sqrt {2\pi |\sigma^2}} e^{-\dfrac{1}{2\sigma^2}(x-\mu )^2)}\)

This is the same wher \(n=1\).

Singular Gaussians

Need det \(|\Sigma |\) and \(\Sigma^{-1}\). These rely on the covariance matrix not being degenerate.

If the covariance matrix is degenerate we can instead use the pseudo inverse, and the pseudo determinant.

Extreme value distributions

Type-I - Gumbel distribution

The probability function is:

\(f(x)=\dfrac{1}{\beta }e^{-(\dfrac{x-\mu}{\beta }+e^{-\dfrac{x-\mu }{\beta }})}\)

We can use:

\(z=\dfrac{x-\mu }{\beta }\)

To get:

\(f(x)=\dfrac{1}{\beta }e^{-(z+e^{-z})}\)

The difference between two draws from a Gumbel distribution is drawn from the logistic function.

Type-II - Frechet distribution

Type-III - Reversed Weibull distribution

Mixture models

Gaussian Mixture Models

Mixture models

We have a latent variable which is part of the process

The variable is distributed according to parametric distribution, but parameters are different for differnet latent classes.

There are \(K\) latent classes, and so \(K\) sets of parameters.

The population is weighted into the \(K\) classes.

We have a distribution, but we have different parameters for the distribution for different populations.

For example we could observe the height of men and women, where both are normally distributed but with different parameters.

Where there is a normal distribution, this is a Gaussian mixture model.

If there is more than one variable to observe, this is a multivariate Gaussian mixture model.

Gaussian Mixture Models (GMM)

In a Gaussian Mixture Model each non latent variable has a normal distriubtion with a mean and variance. For multiple variables there is a covariance matrix.


Laplace distribution

Dirac distribution

Empirical distribution

Split-normal distribution