We make the same assumptions as OLS.
\(\mathbf {y}=\mathbf {X}\theta+\boldsymbol {\epsilon }\)
We assume:
\(E[\epsilon |\boldsymbol X]=0\)
\(Cov [\epsilon |\boldsymbol X]=\boldsymbol \Omega\)
GLS estimator is:
\(\hat \theta_{GLS} = argmin_b (y-Xb)^T\Omega^{-1}(y-Xb)\)
\(\hat \theta_{GLS}=(X^T\Omega ^{-1}X)^{-1}X^T\Omega^{-1}y\)
This is the vector that minimises the Mahalanobis distance.
This is equivalent to doing OLS on a linearly transformed version of the data.
If \(\Omega\) is known, we can proceed. Generally, however, \(\Omega\) is not known, and so the GLS estimate in infeasible.
We do OLS to get a consistent estimate of \(\Omega\), \(\hat \Omega\).
We then plug this into the GLS estimator.
you have the same sandwich term as before, so same process, right?