# Statistics

## Creating statistics

### Creating statistics

We take a sample from the distribution.

$$x=(x_1, x_2,...x_n)$$

A statistic is a function on this sample.

$$S=S(x_1, x_2,...,x_n)$$.

## Moments of statistics

### Bias from single and joint estimation

#### Bias from single estimation

$$\mathbf x_i$$ and $$\mathbf z_i$$ are not independent, so we cannot estimate just $$y_i=\mathbf x_i\theta$$.

#### Bias from joint estimation

We could estimate our equation with a single ML algorithm.

$$y_i=f(\mathbf x_i, \theta) +g(\mathbf z_i) +\epsilon_i$$

For example, using LASSO.

However this would introduce bias into our estimates for $$\theta$$.

#### Bias from iterative estimation

We could iteratively estimate both $$\theta$$ and $$g(\mathbf z_i)$$.

For example iteratvely doing OLS for $$\theta$$ and random forests for $$z_i$$.

This would also introduce bias into $$\theta$$.

## Asymptotic properties of statistics

### Asymptotic distributions

$$f(\hat \theta )\rightarrow^d G$$

Where $$G$$ is some distribution.

### Asymptotic normality

Many statistics are asymptotically normally distribution.

This is a result of the central limit theorem.

For example:

$$\sqrt n S\rightarrow^d N(s, \sigma^2)$$

#### Confidence intervals for asymptotically normal statistics

We have the mean and variance, and know the distribution. This allows us to calculare confidence intervals.

## Order statistics

### Order statistics

#### Defining order statistics

The $$k$$th order statistic is the $$k$$th smallest value in a sample.

$$x_{(1)}$$ is the smallest value in a sample, the minimum.

$$x_{(n)}$$ is the largest value in a sample, the maximum.

#### Probability distributions of order statistics

The probability distribution of order statistics depends on the underlying probability distribution.

#### Probability distribution of sample maximum

If we have:

$$Y=\max \mathbf X$$

The probability distribution is:

$$P(Y\le y)=P(X_1\le y, X_2\le y,...,X_n\le y)$$

If these are iid we have:

$$P(Y\le y)=\prod_i P(X_i\le y)$$

$$F_y(y)=F_X(y)^n$$

The density function is:

$$f_y(y)=nF_X(y)^{n-1}f_x(y)$$

#### Probability distribution of the sample minimum

If we have:

$$Y=\min \mathbf X$$

The probability distribution is:

$$P(Y\le y)=P(X_1\ge y, X_2\ge y,...,X_n\ge y)$$

If these are iid we have:

$$P(Y\le y)=\prod_i P(X_i\ge y)$$

$$F_y(y)=[1-F_X(y)]^n$$

The density function is:

$$f_y(y)=-n[1-F_X(y)]^{n-1}f_x(y)$$

## Bootstrapping

### Bootstrapping

If we have a sample of $$n$$, we can create bootstrap samples by drawing with replacement for other sets with $$n$$ members.

## Jackknifing

### The jackknife

We have a statistic:

$$S(x_1, x_2,...,x_n)$$

We may want to estimate moments for this statistic, but are unable to do so.

#### The jackknife estimator

The jackknife is an approach for getting moments for statistics.

We start by creating $$n$$ statistics each leaving out one observation.

$$\bar S_i(x_1,x_2,...x_{i-1},x_{i+1},...,x_n)$$

We define:

$$\bar S=\dfrac{1}{n}\sum_i\bar S_i$$

#### Moments of the jackknife estimator

We want to know the variance.

$$Var \bar S=\dfrac{n-1}{n}\sum_i(\bar S_i-\bar S)^2$$.

### The infintesimal jackknife

#### The jackknife as a weighting

In the jackknife we calculate the statistic leaving one observation out.

This is the same as weighting observations and giving one a weighting of $$0$$ and the others $$1$$.

#### The infintesimal jackknife

For the infintesimal jackknife we reduce the weight not to $$0$$, but by an infintesimal amount.