We have a likelihood function of the data.
\(L(\theta ; X)=P(X|\theta )\)
We choose values for \(\theta\) which maximise the likelihood function.
\(argmax_\theta P(X|\theta )\)
That is, for which values of \(\theta\) was the observation we saw most likely?
This is a mode estimate.
\(L(\theta ; X)=\prod_i P(x_i|\theta )\)
We can take logarithms, which preserve stationary points. As logarithms are defined on all values above \(0\), and all probabilities are also above zero (or zero), this preserves solutions.
The non-zero stationary points of:
\(\ln L(\theta ; X)=\ln \prod_i P(x_i|\theta )\)
\(\ln L(\theta ; X)=\sum_i \ln P(x_i|\theta )\)
Let’s take our simple example about coins. Heads and tails are the only options, so \(P(H)+P(T)=1\).
\(P(H|\theta )=\theta\)
\(P(T|\theta )=1-\theta\)
\(\ln L(\theta ; X)=\sum_i \ln P(x_i|\theta )\)
If we had \(5\) heads and \(5\) tails we would have:
\(\ln L(\theta ; X)=5\ln (\theta )+ 5\ln (1-\theta )\)
So \(P(H)=\dfrac{1}{2}\) is the value which makes our observation most likely.
The parameters are the population means and covariance matrix.
The MLE estimator for the mean is the sample mean.
The MLE estimator for the covariance matrix is the unadjusted sample covariance.
We can partition out Likelihood functions, and include a part only with variance.
Existing score: rename Maximum Likelihood score
MLE bad if true theta not at where score is 0
Eg if one sided tails, true theta is not at MLE condition.
Can we find other scores?
Score of one parameter depends on other parameters
If we misestimate one, then estimate another, will be bad answer
We want the score not to change around bad estimates
We want nuisance parameter bias not to affect score
separate page for orthogonality for sets of parameters. eg nuisance; of interest