We take a sample from the distribution.
\(x=(x_1, x_2,...,x_n)\)
A statistic is a function on this sample.
\(S=S(x_1, x_2,...,x_n)\).
\(\mathbf x_i\) and \(\mathbf z_i\) are not independent, so we cannot estimate just \(y_i=\mathbf x_i\theta\).
We could estimate our equation with a single ML algorithm.
\(y_i=f(\mathbf x_i, \theta) +g(\mathbf z_i) +\epsilon_i\)
For example, using LASSO.
However this would introduce bias into our estimates for \(\theta\).
We could iteratively estimate both \(\theta\) and \(g(\mathbf z_i)\).
For example iteratvely doing OLS for \(\theta\) and random forests for \(z_i\).
This would also introduce bias into \(\theta\).
\(f(\hat \theta )\rightarrow^d G\)
Where \(G\) is some distribution.
Many statistics are asymptotically normally distribution.
This is a result of the central limit theorem.
For example:
\(\sqrt n S\rightarrow^d N(s, \sigma^2)\)
We have the mean and variance, and know the distribution. This allows us to calculare confidence intervals.