Statistics

Creating statistics

Creating statistics

We take a sample from the distribution.

\(x=(x_1, x_2,...x_n)\)

A statistic is a function on this sample.

\(S=S(x_1, x_2,...,x_n)\).

Moments of statistics

Bias from single and joint estimation

Bias from single estimation

\(\mathbf x_i\) and \(\mathbf z_i\) are not independent, so we cannot estimate just \(y_i=\mathbf x_i\theta \).

Bias from joint estimation

We could estimate our equation with a single ML algorithm.

\(y_i=f(\mathbf x_i, \theta) +g(\mathbf z_i) +\epsilon_i\)

For example, using LASSO.

However this would introduce bias into our estimates for \(\theta \).

Bias from iterative estimation

We could iteratively estimate both \(\theta \) and \(g(\mathbf z_i)\).

For example iteratvely doing OLS for \(\theta \) and random forests for \(z_i\).

This would also introduce bias into \(\theta \).

Asymptotic properties of statistics

Asymptotic distributions

\(f(\hat \theta )\rightarrow^d G \)

Where \(G\) is some distribution.

Asymptotic mean and variance

Asymptotic normality

Many statistics are asymptotically normally distribution.

This is a result of the central limit theorem.

For example:

\(\sqrt n S\rightarrow^d N(s, \sigma^2) \)

Confidence intervals for asymptotically normal statistics

We have the mean and variance, and know the distribution. This allows us to calculare confidence intervals.

Order statistics

Order statistics

Defining order statistics

The \(k\)th order statistic is the \(k\)th smallest value in a sample.

\(x_{(1)}\) is the smallest value in a sample, the minimum.

\(x_{(n)}\) is the largest value in a sample, the maximum.

Probability distributions of order statistics

The probability distribution of order statistics depends on the underlying probability distribution.

Probability distribution of sample maximum

If we have:

\(Y=\max \mathbf X\)

The probability distribution is:

\(P(Y\le y)=P(X_1\le y, X_2\le y,...,X_n\le y)\)

If these are iid we have:

\(P(Y\le y)=\prod_i P(X_i\le y)\)

\(F_y(y)=F_X(y)^n\)

The density function is:

\(f_y(y)=nF_X(y)^{n-1}f_x(y)\)

Probability distribution of the sample minimum

If we have:

\(Y=\min \mathbf X\)

The probability distribution is:

\(P(Y\le y)=P(X_1\ge y, X_2\ge y,...,X_n\ge y)\)

If these are iid we have:

\(P(Y\le y)=\prod_i P(X_i\ge y)\)

\(F_y(y)=[1-F_X(y)]^n\)

The density function is:

\(f_y(y)=-n[1-F_X(y)]^{n-1}f_x(y)\)

Bootstrapping

Bootstrapping

If we have a sample of \(n\), we can create bootstrap samples by drawing with replacement for other sets with \(n\) members.

Jackknifing

The jackknife

We have a statistic:

\(S(x_1, x_2,...,x_n)\)

We may want to estimate moments for this statistic, but are unable to do so.

The jackknife estimator

The jackknife is an approach for getting moments for statistics.

We start by creating \(n\) statistics each leaving out one observation.

\(\bar S_i(x_1,x_2,...x_{i-1},x_{i+1},...,x_n)\)

We define:

\(\bar S=\dfrac{1}{n}\sum_i\bar S_i\)

Moments of the jackknife estimator

We want to know the variance.

\(Var \bar S=\dfrac{n-1}{n}\sum_i(\bar S_i-\bar S)^2\).

The infintesimal jackknife

The jackknife as a weighting

In the jackknife we calculate the statistic leaving one observation out.

This is the same as weighting observations and giving one a weighting of \(0\) and the others \(1\).

The infintesimal jackknife

For the infintesimal jackknife we reduce the weight not to \(0\), but by an infintesimal amount.