We have: \(\hat{\theta }=(X^TX)^{-1}X^Ty\)
Let’s take the expectation.
\(E[\hat{\theta }]=E[(X^TX)^{-1}X^Ty]\)
Let’s model \(y\) as a function of \(X\). As we place no restrictions on the error terms, this is not as assumption.
\(y=X\theta +\epsilon\).
\(E[\hat{\theta }]=E[(X^TX)^{-1}X^T(X\theta +\epsilon)]\)
\(E[\hat{\theta }]=E[(X^TX)^{-1}X^TX\theta ]+E[(X^TX)^{-1}X^T \epsilon)]\)
\(E[\hat{\theta }]=\theta +E[(X^TX)^{-1}X^T \epsilon)]\)
\(E[\hat{\theta }]=\theta +E[(X^TX)^{-1}X^T]E[ \epsilon]+cov [(X^TX)^{-1}X^T ,\epsilon]\)
\(E[\epsilon =0]\)
This means that:
\(E[\hat{\theta }]=\theta + cov [(X^TX)^{-1}X^T ,\epsilon]\)
If the error terms and \(X\) are uncorrelated then \(E[\epsilon|X]=0\) and therefore:
\(E[\hat{\theta }]=\theta\)
So this is an unbiased estimator, so long as the condition holds.
We know:
\(\hat \theta =(X^TX)^{-1}X^Ty\)
\(y=X\theta +\epsilon\)
Therefore:
\(\hat \theta =(X^TX)^{-1}X^T(X\theta +\epsilon)\)
\(\hat \theta =\theta +(X^TX)^{-1}X^T\epsilon\)
\(\hat \theta -\theta =(X^TX)^{-1}X^T\epsilon\)
\(Var [\hat \theta ]=E[(\hat \theta -\theta)(\hat \theta -\theta )^T]\)
\(Var [\hat \theta ]=E[(X^TX)^{-1}X^T\epsilon(X^TX^{-1}X^T\epsilon )^T]\)
\(Var [\hat \theta ]=E[(X^TX)^{-1}X^T\epsilon \epsilon^T X(X^TX)^{-1}]\)
\(Var [\hat \theta ]=(X^TX)^{-1}X^TE[\epsilon \epsilon^T ]X(X^TX)^{-1}\)
We write:
\(\Omega=E[\epsilon \epsilon^T]\)
\(Var [\hat \theta ]=(X^TX)^{-1}X^T\Omega X(X^TX)^{-1}\)
Depending on how we estimate \(\Omega\), we get different variance terms.
If IID:
\(\Omega = I\sigma^2_{\epsilon }\)
\(Var [\hat \theta ]=(X^TX)^{-1}X^TI\sigma^2_{\epsilon } X(X^TX)^{-1}\)
\(Var [\hat \theta ]=\sigma^2_\epsilon (X^TX)^{-1}\)
\(Var [\hat \theta ]=(X^TX)^{-1}X^T\Omega X(X^TX)^{-1}\)
\(\Omega_{ij}=\delta_{ij}\epsilon_i\epsilon_j\)
These are also known as the Eicker-Huber-White standard errors, or the White correction.
These are also refered to as robust standard errors.
\(\hat \theta_{OLS}=(X^TX)^{-1}X^Ty\)
\(E[\hat \theta_{OLS}]=w\)
\(Var[\hat \theta_{OLS}]=\sigma^2 (X^TX)^{-1}\)
\(y_i=\mathbf x_i\theta +\epsilon_i\)
\(P(y=y_i|x=x_i)=P(\epsilon_i=y_i-\mathbf x_i \theta )\)
If we assume \(\epsilon_i \sim N(0, \sigma^2_\epsilon )\) we have:
\(P(y=y_i|x=x_i)=\dfrac{1}{\sqrt {2\pi \sigma^2_\epsilon }}e^{-\dfrac{(y_i-\mathbf x_i\theta )^2}{2\sigma_\epsilon^2}}\)
\(L(X, \theta )=\prod_{i=1}^n\dfrac{1}{\sqrt {2\pi \sigma^2_\epsilon }}e^{-\dfrac{(y_i-\mathbf x_i\theta )^2}{2\sigma_\epsilon^2}}\)
\(l(X, \theta )=\sum_{i=1}^n -\dfrac{1}{2}\ln (2\pi \sigma_\epsilon^2)-\dfrac{(y_i-\mathbf x_i\theta )^2}{2\sigma_\epsilon^2}\)
\(\dfrac{\delta l}{\delta \theta_j }=\sum_{i=1}^n2x_{ij}\dfrac{y_i-\mathbf x_{i}\theta }{2\sigma^2_\epsilon}\)
\(\sum_{i=1}^nx_{ij}(y_i-\hat \theta_{MLE}\mathbf x_{i} )=0\)
\(X^T(y-X\hat \theta_{MLE} )=0\)
\(X^Ty=X^TX\hat \theta_{MLE}\)
\(\hat \theta_{MLE}=(X^TX)^{-1}X^Ty\)
If errors are normally IID then:
\(\hat \theta_{OLS}=\hat \theta_{MLE}\)
Mean of errors zero + If the model should only have errors on upside or downside for some reason, OLS will not provide this.
Homoscedastic (all have the same variance) + The results aren’t biased, but variances etc are
Errors are uncorrelated + (this would mean you should add lagged variables etc)
show bias from each GM violation
OLS is BUE under normally distributed errors
OLS is BLUE for non-normally distribed errors
Noise in \(y\) doesn’t cause bias.
Noise in \(x\) does cause bias.
Need to correct.
Causality v correlation. If just getting correlation, could have bad out of sample performance
Section on causality. Difference between disease causes symptom and symptom causes disease
Linear models can be manipulated to have any variable on the left.