There is a set \(s\) such that:

\(P(x\in s)=p\)

\(P(x\not\in s)=0\)

The mean is the mean of the set \(s\).

If the set is all numbers of the real line between two values, \(a\) and \(b\), then:

The mean is \(\dfrac{1}{2}(a+b)\).

The variance is \(\dfrac{(b-a)^2}{12}\) in the continuous case.

The variance is \(\dfrac{(b-a+1)^2-1}{12}\) in the discrete case.

The outcome of a Bernoulli trial is either \(0\) or \(1\). We can describe it as:

\(P(1)=p\)

\(P(0)=1-p\)

With a single parameter \(p\).

The mean of a Bernoulli trial is \(E[X]=(1-p)(0)+(p)(1)=p\).

The variance of a Bernoulli trial is \(E[(X-\mu)^2]=(1-p)(0-\mu)^2+(p)(1-\mu)^2=(1-p)p^2+p(1-p)^2]=p(1-p)\).

If we repeat a Bernoulli trials with the same parameter and sum the results, we have the binomial distribution.

We therefore have two parameters, \(p\) and \(n\).

\(P(X=x)={n\choose x }p^x(1-p)^{n-x}\)

The mean is \(np\), which can be seen as the trials are independent.

Similarly, the variances can be addeded together giving \(np(1-p)\).

We can use the Poisson distribution to model the number of indepedent events that occur in an a time period.

For a very short time period the chance of us observing an event is a Bernoulli trial.

\(P(1)=p\)

\(P(0)=1-p\)

Let’s consider the chance of repeatedly getting \(0\): \(P(0;t)\).

We can see that: \(P(0;t+\delta t)=P(0;t)(1-p)\).

And therefore:

\(P(0;t+\delta t)-P(0;t)=-pP(0;t))\)

By setting \(p=\lambda \delta t\):

\(\dfrac{P(0;t+\delta t)-P(0;t)}{\delta t}=-\lambda P(0;t))\)

\(\dfrac{\delta P(0;t)}{\delta t}=-\lambda P(0;t)\)

\(P(0;t)=Ce^{-\lambda t}\)

If \(t=0\) then \(P(0;t)=0\) and so \(C=1\).

\(P(0;t)=e^{-\lambda t}\)

The mass function for the binomial case is:

\(f(x)=\dfrac{n!}{x!(n-x)!}p^k(1-p)^{n-k}\)

This generalises the binomial distribution where there are more than \(2\) outcomes.

\(f(x_1,...,x_n)=\dfrac{n!}{\prod_i x_i!}\prod_i p_i^{x_i}\)

\(P(X)=\dfrac{\alpha -1}{a}(\dfrac{x}{a})^{-\alpha }\)

Where \(a\) is the lower bound.

\(P(X)=0\) for \(X<a\).

\(E[X^m]=\dfrac{\alpha - 1}{\alpha -1 -m }a\)

If \(m\ge \alpha -1 \) then this is not well defined.

Higher order moments, such that the variance, cannot be identified.

The logistic distribution has the cumulative distribution function:

\(F(x)=\dfrac{1}{1+e^{-\dfrac{x-\mu }{s}}} \)

The Lévy distribution is a continuous probability distribution.

The marginal probability is:

\(P(X)=\sqrt {\dfrac{c}{2\pi }}\dfrac{e^{-\dfrac{c}{2(x-\mu )}}}{(x-\mu )^{\dfrac{3}{2}}}\)

\(f_x=\dfrac{1}{\sqrt {2\pi \sigma^2 }} e^{-\dfrac{(x-\mu)^2}{2\sigma }}\)

For univariate:

\(x \sim N(\mu, \sigma^2 )\)

We define the multivariate gaussian distribution as the distribution where any linear combination of components are gaussian.

For multivariate:

\(X \sim N(\mu, \Sigma )\)

Where \(\mu \) is now a vector, and \(\Sigma \) is the covariance matrix.

Density function is :

\(f_x=\dfrac{1}{\sqrt {(2\pi )^n|\Sigma |}} e^{-\dfrac{1}{2}(x-\mu )^T\Sigma^{-1}(x-\mu)}\)

For normal gaussian it is:

\(f_x=\dfrac{1}{\sqrt {2\pi |\sigma^2}} e^{-\dfrac{1}{2\sigma^2}(x-\mu )^2)}\)

This is the same wher \(n=1\).

Need det \(|\Sigma |\) and \(\Sigma^{-1}\). These rely on the covariance matrix not being degenerate.

If the covariance matrix is degenerate we can instead use the pseudo inverse, and the pseudo determinant.

The probability function is:

\(f(x)=\dfrac{1}{\beta }e^{-(\dfrac{x-\mu}{\beta }+e^{-\dfrac{x-\mu }{\beta }})}\)

We can use:

\(z=\dfrac{x-\mu }{\beta }\)

To get:

\(f(x)=\dfrac{1}{\beta }e^{-(z+e^{-z})}\)

The difference between two draws from a Gumbel distribution is drawn from the logistic function.

We have a latent variable which is part of the process

The variable is distributed according to parametric distribution, but parameters are different for differnet latent classes.

There are \(K\) latent classes, and so \(K\) sets of parameters.

The population is weighted into the \(K\) classes.

We have a distribution, but we have different parameters for the distribution for different populations.

For example we could observe the height of men and women, where both are normally distributed but with different parameters.

Where there is a normal distribution, this is a Gaussian mixture model.

If there is more than one variable to observe, this is a multivariate Gaussian mixture model.

In a Gaussian Mixture Model each non latent variable has a normal distriubtion with a mean and variance. For multiple variables there is a covariance matrix.