Given a sequence: \(x_1,...x_n\).
The likelihood is:
\(L=\prod_{i=2}^n p_{x_{i-1},x_i}\)
If there are \(k\) states we can rewrite this as:
\(L=prod_{i=1}^k\prod_{j=1}^k n_{ij}p_{ij}\)
Where \(p_{ij}\) is the chance of moving from state \(i\) to state \(j\), and \(n_{ij}\) is the number of transtions between \(i\) and \(j\).
The log likelhood is:
\(\ln L=\sum_{i=1}^k\sum_{j=1}^kn_{ij}\ln p_{ij}\)
Not all parameters are free. All probabilities must sum to \(1\).
\(\ln L=\sum_{i=1}^k\sum_{j=1}^kn_{ij}\ln p_{ij}-sum_{i=1}\lambda_i (\sum_{j=1}p_{ij}-1)\)
This gives us:
\(\hat p_{ij}=\dfrac{n_{ij}}{\sum_k n_{ik}}\)
We can represent the transition matrix as a series of rules to reduce the number of dimensions
\(P(x_t |y_{t-1})=f(x,y)\)
can represent states as number, rather than atomic. could be continuous, or even real.
in more complex, can use vectors.
Sample moments must converge to generating momements. Not guaranteed.
Eg process with path dependence. 50
Generating average is £50, but sample will only convergen to £100 or £0