We take a sample from the distribution.

\(x=(x_1, x_2,...x_n)\)

A statistic is a function on this sample.

\(S=S(x_1, x_2,...,x_n)\).

\(\mathbf x_i\) and \(\mathbf z_i\) are not independent, so we cannot estimate just \(y_i=\mathbf x_i\theta \).

We could estimate our equation with a single ML algorithm.

\(y_i=f(\mathbf x_i, \theta) +g(\mathbf z_i) +\epsilon_i\)

For example, using LASSO.

However this would introduce bias into our estimates for \(\theta \).

We could iteratively estimate both \(\theta \) and \(g(\mathbf z_i)\).

For example iteratvely doing OLS for \(\theta \) and random forests for \(z_i\).

This would also introduce bias into \(\theta \).

\(f(\hat \theta )\rightarrow^d G \)

Where \(G\) is some distribution.

Many statistics are asymptotically normally distribution.

This is a result of the central limit theorem.

For example:

\(\sqrt n S\rightarrow^d N(s, \sigma^2) \)

We have the mean and variance, and know the distribution. This allows us to calculare confidence intervals.

The \(k\)th order statistic is the \(k\)th smallest value in a sample.

\(x_{(1)}\) is the smallest value in a sample, the minimum.

\(x_{(n)}\) is the largest value in a sample, the maximum.

The probability distribution of order statistics depends on the underlying probability distribution.

If we have:

\(Y=\max \mathbf X\)

The probability distribution is:

\(P(Y\le y)=P(X_1\le y, X_2\le y,...,X_n\le y)\)

If these are iid we have:

\(P(Y\le y)=\prod_i P(X_i\le y)\)

\(F_y(y)=F_X(y)^n\)

The density function is:

\(f_y(y)=nF_X(y)^{n-1}f_x(y)\)

If we have:

\(Y=\min \mathbf X\)

The probability distribution is:

\(P(Y\le y)=P(X_1\ge y, X_2\ge y,...,X_n\ge y)\)

If these are iid we have:

\(P(Y\le y)=\prod_i P(X_i\ge y)\)

\(F_y(y)=[1-F_X(y)]^n\)

The density function is:

\(f_y(y)=-n[1-F_X(y)]^{n-1}f_x(y)\)

If we have a sample of \(n\), we can create bootstrap samples by drawing with replacement for other sets with \(n\) members.

We have a statistic:

\(S(x_1, x_2,...,x_n)\)

We may want to estimate moments for this statistic, but are unable to do so.

The jackknife is an approach for getting moments for statistics.

We start by creating \(n\) statistics each leaving out one observation.

\(\bar S_i(x_1,x_2,...x_{i-1},x_{i+1},...,x_n)\)

We define:

\(\bar S=\dfrac{1}{n}\sum_i\bar S_i\)

We want to know the variance.

\(Var \bar S=\dfrac{n-1}{n}\sum_i(\bar S_i-\bar S)^2\).

In the jackknife we calculate the statistic leaving one observation out.

This is the same as weighting observations and giving one a weighting of \(0\) and the others \(1\).

For the infintesimal jackknife we reduce the weight not to \(0\), but by an infintesimal amount.