The score is defined as the differential of the log-likelihood function with respect to \(\theta\).
\(V(\theta, X)=\dfrac{\delta }{\delta \theta }l(\theta ; X)\)
\(V(\theta, X)=\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X)\)
The expectation of the score, given the true value of \(\theta\) is:
\(E[V(X|\theta)]=\int V(X|\theta) dX\)
\(E[V(X|\theta)]=E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X) ]\)
\(E[V(X|\theta)]=\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X)\)
\(E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]\)
\(\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}P(\theta )d\theta\)
We can show that the expected value of this is \(0\).
The variance of the score is:
\(var [\dfrac{\delta }{\delta \theta }l(\theta ; X) ]\)
\(var [\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]\)
The Fisher information is the variance:
\(E[(\dfrac{\delta }{\delta \theta }\log f(X, \theta ))^2 |\theta ]\)
\(E[\dfrac{\delta^2 }{\delta \theta^2 }\log f(X, \theta ) |\theta ]\)
Same as expectation of score squared, because centred around \(0\).
We have \(k\) parameters.
\(I(\theta )_{ij}=E[(\dfrac{\delta }{\delta \theta_i}\log f(X, \theta ))(\dfrac{\delta }{\delta \theta_j }\log f(X, \theta ))|\theta ]\)
The Fisher information matrix contains informatio about the population
The observed Fisher infoirmation is the negative of the Hessian of the log likelihood.
We have:
\(l(\theta |\mathbf X)=\sum_i\ln P(\mathbf x_i|\theta )\)
\(J(\theta^*)=-\nabla \nabla^Tl(\theta|mathbf X )|_{\theta = \theta^*}\)
The Fisher information is the expected value of this.
\(I(\theta )=E[J(\theta)]\)
Two variables are called orthogonal if their entry in fisher info matrix is 0
This means that the parameters can be calculated separately. MLE estimates are separate
This can be written as a moment condition
\(\delta\)