When we take statistics we are often concerned with inferring properties of the underlying probability function.
As the properties of the probability distribution function affect the chance of observing the sample, we can analyse samples to infer properties of the underlying distribution.
There are many properties would could be interested in. This includes moments and parameters of a specific probability distribution function.
An estimator is a statistic which is our estimate of one of these values.
Emphasise that statistics and estimators are different things. A statistic may be terrible estimator, but be useful for other purposes.
We can make estimates of a population parameter using statistics from the same.
A statistic is sufficient if it contains all the information needed to estimate the parameter.
We can describe the role of a parameter as:
\(P(x|\theta, t)\)
\(t\) is a sufficient statistic for \(\theta\) if:
\(P(x|t)=P(x|\theta, t)\)
The error of an estimator is the difference between it and the actual parameter.
\(Error_{\theta }[\hat \theta ]=\hat \theta - \theta\)
The bias of an estimator is the expected error.
\(Bias_\theta [\hat \theta ]:=E_\theta [\hat \theta -\theta ]\)
\(Bias_\theta [\hat \theta ]:=E_\theta [\hat \theta] -\theta\)
Mean squared error
\(MSE = E[(\hat \theta - \theta )^2]=E[((\hat \theta - E[\hat \theta ])+(E[\hat \theta ]-\theta ))^2]\)
\(MSE = E[(\hat \theta - \theta )^2]=E[(\hat \theta - E[\hat \theta ])^2+(E[\hat \theta ]-\theta )^2+2(E[\hat \theta ]-\theta )(\hat \theta- E[\hat \theta ])]\)
\(MSE = E[(\hat \theta - \theta )^2]=E[(\hat \theta - E[\hat \theta ])^2]+E[(E[\hat \theta ]-\theta)^2] +E[2(E[\hat \theta ]-\theta )(\hat \theta- E[\hat \theta ])]\)
\(MSE = E[(\hat \theta - \theta )^2]=Var(\hat \theta )+(E[\hat \theta ]-\theta)^2 +2(E[\hat \theta ]-\theta )E[\hat \theta- E[\hat \theta ]]\)
\(MSE = E[(\hat \theta - \theta )^2]=Var(\hat \theta )+Bias (\hat \theta )^2\)
This is the square root of the MSE.
It is also called the Root Mean Square Deviation (RMSD)
A statistic \(\hat \theta\) is a consistent estimator for \(\theta\) if its error tends to \(0\).
That is:
\(\hat \theta\rightarrow^p \theta\)
We can show that an estimator is consistent if we can write:
\(\hat \theta -\theta\) as a function of \(n\), causing it to tend to \(0\).
Efficiency measures the speed at which a consistent estimator tends towards the true value.
The speed of this convergence is the efficiency. could be fairly efficient plus biased too p Measured as:
\(e(\hat \theta )=\dfrac{\dfrac{1}{I(\theta )}}{Var (\hat \theta )}\)
If an estimator as an efficiency of \(1\) and is unbiased, it is efficient.
We can measure the relative efficiency of two consistent estimators:
The relative efficiency is the variance of the first estimator, divided by the variance of the second.
An estimator is root-n consistent if it is consistent and its variance is:
\(O(\dfrac{1}{n})\)
A consistent estimator is \(n^\delta\)-consistent if its variance is:
\(O(\dfrac{1}{n^{2 \delta }})\)
For an unbiased estimator, the variance cannot be below the Cramer-Rao lower bound.
\(Var (\hat \theta )\ge \dfrac{1}{I(\theta )}\)
Where \(I(\theta )\) is the Fisher information.
We can prove this.
We have the score:
\(V=\dfrac{\delta }{\delta \theta }\ln f(X, \theta )\)
\(V=\dfrac{1}{f(X, \theta )}\dfrac{\delta }{\delta \theta } f(X, \theta )\)
The expectation of the score is \(0\):
\(E[V]=E[\dfrac{1}{f(X, \theta )}\dfrac{\delta }{\delta \theta } f(X, \theta )]\)
\(E[V]=\int \dfrac{1}{f(X, \theta )}\dfrac{\delta }{\delta \theta } f(X, \theta )dx\)
Bias-variance trade-off. if we care about \(E[(y-xt)^2]\) then we may not want an unbiased estimator. by adding some bias we could reduce the variance a lot.
Assessing estimators of parametric models: do monte carlo simulations
loss functions for point estimates. point estimate confidence interval h3
best asymptotically normal (BAN) estimators AKA consistently asymptotically normal efficience (CANE)
these are root n consistent!
Feasible uses known terms. Infeasible uses those that aren’t
Eg \(\Omega\) is infeasible, unless we assume its form, making it feasible.
pages: + Cramer rao + Minimum-Variance Unbiased Estimators (MVUE)
Unbiased estimators for some kernel value. Can use used to estimate population moments.
in cramer rao stuff?
in bias section?
We can consider \(X_n\) to be a sequence. We are interest in asymptotic properties of this sequence.
section on fat tails + can’t estimate pop mean from sample mean + method of moments requires non-fat tails + correlation/covariance with fat tails.