Creating statistics

Creating statistics

We take a sample from the distribution.

\(x=(x_1, x_2,...,x_n)\)

A statistic is a function on this sample.

\(S=S(x_1, x_2,...,x_n)\).

Moments of statistics

Bias from single and joint estimation

Bias from single estimation

\(\mathbf x_i\) and \(\mathbf z_i\) are not independent, so we cannot estimate just \(y_i=\mathbf x_i\theta \).

Bias from joint estimation

We could estimate our equation with a single ML algorithm.

\(y_i=f(\mathbf x_i, \theta) +g(\mathbf z_i) +\epsilon_i\)

For example, using LASSO.

However this would introduce bias into our estimates for \(\theta \).

Bias from iterative estimation

We could iteratively estimate both \(\theta \) and \(g(\mathbf z_i)\).

For example iteratvely doing OLS for \(\theta \) and random forests for \(z_i\).

This would also introduce bias into \(\theta \).

Asymptotic properties of statistics

Asymptotic distributions

\(f(\hat \theta )\rightarrow^d G \)

Where \(G\) is some distribution.

Asymptotic mean and variance

Asymptotic normality

Many statistics are asymptotically normally distribution.

This is a result of the central limit theorem.

For example:

\(\sqrt n S\rightarrow^d N(s, \sigma^2) \)

Confidence intervals for asymptotically normal statistics

We have the mean and variance, and know the distribution. This allows us to calculare confidence intervals.