We take a sample from the distribution.
A statistic is a function on this sample.
\(\mathbf x_i\) and \(\mathbf z_i\) are not independent, so we cannot estimate just \(y_i=\mathbf x_i\theta \).
We could estimate our equation with a single ML algorithm.
\(y_i=f(\mathbf x_i, \theta) +g(\mathbf z_i) +\epsilon_i\)
For example, using LASSO.
However this would introduce bias into our estimates for \(\theta \).
We could iteratively estimate both \(\theta \) and \(g(\mathbf z_i)\).
For example iteratvely doing OLS for \(\theta \) and random forests for \(z_i\).
This would also introduce bias into \(\theta \).
\(f(\hat \theta )\rightarrow^d G \)
Where \(G\) is some distribution.
Many statistics are asymptotically normally distribution.
This is a result of the central limit theorem.
\(\sqrt n S\rightarrow^d N(s, \sigma^2) \)
We have the mean and variance, and know the distribution. This allows us to calculare confidence intervals.