The Metropolis-Hastings algorithm creates a set of samples \(x\) such that the distribution of the samples approaches the goal distribution.
The algorithm takes an arbitrary starting sample \(x_0\). It then must decide which sample to consider next.
It does this using a Markov chain. That is, there is a map \(g(x_j, x_i)\).
This distribution is generally a normal distribution around \(x_i\), making the process a random walk.
Now we have a considered sample, we can either accept or reject it. It is this step that makes the end distribution approximage the function.
We accept if \(\dfrac{f(x_j)}{f(x_i)}>u\), where \(u\) is a random variable between \(0\) and \(1\), generated each time.
We can calculate this because we know this function.
As with Metropolis-Hastings, we want to generate samples for \(P(X)\) and use this to approximate its form.
We do this by using the conditional distribution. If \(X\) is a vector then we also have:
\(P(x_j|x_0,...,x_{j-1},x_{j+1},...,x_n)\)
We use our knowledge of this distribution.
Start with vector \(x_0\).
This has components \(x_{0,j}\)
To form the next vector \(x_1\) we loop through each component.
\(P(x_{1,0}|x_{0,0},x_{0,1},...,x_{0,n})\)
We use this to form \(x_{1,0}\)
However after th the first component we update this so it uses the updated variables.
\(P(x_{1,k}|x_{1,0},...,x_{1,k-1},x_{0,k},...,x_{0,n}\)
This means we only need to know the conditional distributions.