Point estimates of probability distributions

Point estimates for parameters


When we take statistics we are often concerned with inferring properties of the underlying probability function.

As the properties of the probability distribution function affect the chance of observing the sample, we can analyse samples to infer properties of the underlying distribution.

There are many properties would could be interested in. This includes moments and parameters of a specific probability distribution function.

An estimator is a statistic which is our estimate of one of these values.

Emphasise that statistics and estimators are different things. A statistic may be terrible estimator, but be useful for other purposes.

Sufficient statistics

We can make estimates of a population parameter using statistics from the same.

A statistic is sufficient if it contains all the information needed to estimate the parameter.

We can describe the role of a parameter as:

\(P(x|\theta, t)\)

\(t\) is a sufficient statistic for \(\theta \) if:

\(P(x|t)=P(x|\theta, t)\)

Properties of point estimators

Estimator error and bias

Error of an estimator

The error of an estimator is the difference between it and the actual parameter.

\(Error_{\theta }[\hat \theta ]=\hat \theta - \theta \)

Bias of an estimator

The bias of an estimator is the expected error.

\(Bias_\theta [\hat \theta ]:=E_\theta [\hat \theta -\theta ]\)

\(Bias_\theta [\hat \theta ]:=E_\theta [\hat \theta] -\theta\)

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) of an estimator

Mean squared error

Mean squared error

\(MSE = E[(\hat \theta - \theta )^2]=E[((\hat \theta - E[\hat \theta ])+(E[\hat \theta ]-\theta ))^2]\)

\(MSE = E[(\hat \theta - \theta )^2]=E[(\hat \theta - E[\hat \theta ])^2+(E[\hat \theta ]-\theta )^2+2(E[\hat \theta ]-\theta )(\hat \theta- E[\hat \theta ])]\)

\(MSE = E[(\hat \theta - \theta )^2]=E[(\hat \theta - E[\hat \theta ])^2]+E[(E[\hat \theta ]-\theta)^2] +E[2(E[\hat \theta ]-\theta )(\hat \theta- E[\hat \theta ])]\)

\(MSE = E[(\hat \theta - \theta )^2]=Var(\hat \theta )+(E[\hat \theta ]-\theta)^2 +2(E[\hat \theta ]-\theta )E[\hat \theta- E[\hat \theta ]]\)

\(MSE = E[(\hat \theta - \theta )^2]=Var(\hat \theta )+Bias (\hat \theta )^2\)

Root Mean Square Error (RMSE)

This is the square root of the MSE.

It is also called the Root Mean Square Deviation (RMSD)

Asymptotic properties of estimators

Consistency and efficiency of estimators


A statistic \(\hat \theta \) is a consistent estimator for \(\theta \) if its error tends to \(0\).

That is:

\(\hat \theta\rightarrow^p \theta \)

We can show that an estimator is consistent if we can write:

\(\hat \theta -\theta \) as a function of \(n\), causing it to tend to \(0\).


Efficiency measures the speed at which a consistent estimator tends towards the true value.

The speed of this convergence is the efficiency. could be fairly efficient plus biased too p Measured as:

\(e(\hat \theta )=\dfrac{\dfrac{1}{I(\theta )}}{Var (\hat \theta )}\)

If an estimator as an efficiency of \(1\) and is unbiased, it is efficient.

Relative efficiency

We can measure the relative efficiency of two consistent estimators:

The relative efficiency is the variance of the first estimator, divided by the variance of the second.

Root-n estimators

An estimator is root-n consistent if it is consistent and its variance is:


\(n^\delta \)-convergent

A consistent estimator is \(n^\delta \)-consistent if its variance is:

\(O(\dfrac{1}{n^{2 \delta }})\)

Cramér-Rao lower bound

For an unbiased estimator, the variance cannot be below the Cramer-Rao lower bound.

\(Var (\hat \theta )\ge \dfrac{1}{I(\theta )}\)

Where \(I(\theta )\) is the Fisher information.

We can prove this.

We have the score:

\(V=\dfrac{\delta }{\delta \theta }\ln f(X, \theta )\)

\(V=\dfrac{1}{f(X, \theta )}\dfrac{\delta }{\delta \theta } f(X, \theta )\)

The expectation of the score is \(0\):

\(E[V]=E[\dfrac{1}{f(X, \theta )}\dfrac{\delta }{\delta \theta } f(X, \theta )]\)

\(E[V]=\int \dfrac{1}{f(X, \theta )}\dfrac{\delta }{\delta \theta } f(X, \theta )dx\)

Bias-Variance trade-off

Bias-variance trade-off. if we care about \(E[(y-xt)^2]\) then we may not want an unbiased estimator. by adding some bias we could reduce the variance a lot.


Testing estimators

Assessing estimators of parametric models: do monte carlo simulations


loss functions for point estimates. point estimate confidence interval h3

Estimator properties

best asymptotically normal (BAN) estimators AKA consistently asymptotically normal efficience (CANE)

these are root n consistent!

Feasible and infeasible estimators

Feasible uses known terms. Infeasible uses those that aren’t

Eg \(\Omega \) is infeasible, unless we assume its form, making it feasible.

Bias etc

pages: + Cramer rao + Minimum-Variance Unbiased Estimators (MVUE)

Unbiased estimators for some kernel value. Can use used to estimate population moments.

Rao-Blackwell theorem

One step and k-step estimators

in cramer rao stuff?

Delta method

in bias section?

We can consider \(X_n\) to be a sequence. We are interest in asymptotic properties of this sequence.

Fat tails

section on fat tails + can’t estimate pop mean from sample mean + method of moments requires non-fat tails + correlation/covariance with fat tails.