We start with a random set of parameters, \(x\).
We then loop through the following:
We define a search space local to our current selection.
We randomly select a point from this space.
We compare the new point to our current point. If the new point is better we move to that.
This is similar to random search, however we use a multivariate Gaussian distribution around our current point rather than a hypersphere.
We can use a version of Metropolis-Hastings to find the global maximum of a function \(f(x)\).
We start with an arbitrary point \(x_0\).
We move randomly from this to identiy a candidate point \(x_c\).
We accept this with probability depending on the the relationship between \(x_0\) and \(x_c\).
This process will converge on the global maximum.
There is a hyperparameter for selection. At the extreme this becomes a greedy function.
If we have sampled from the hyperparameter space we know something about the shape.
Can we use this to inform where we should next look?
The shape of the function is \(y=f(\mathbf x)\)
We have observations \(\mathbf X\) and \(\mathbf y\).
So what’s our posterior, \(P(y|\mathbf X, \mathbf y)\)?
The can be a tradeoff between:
Exploring - which gives us a better shape for \(y=f(x)\); and
Exploiting - which gives us a better estimate for the global optimum.
We do not know \(y=f(x)\), but we model it as:
\(z(x)=y(x)+\epsilon\)
We can then maximise \(z\)
We want an algorithm which maps from our history of observations to a new candidate.
There are different approaches:
Probability of improvement - Choosing one with the highest chance of a more optimal value
Expected improvement - Choosing one with the biggest expected increase in the optimal value
Entropy search - choosing one which reduces uncertainty about the global maximum.
We generate a set of candidate parameter values, \(x\).
We evaluate each of these against a fitness function (the function we are optimising).
We assign fitness values to each individual.
We generate a second generation. We select "parents" randomly using the fitness values as weightings.
The values of the new individual are a function of the values of the parents, and noise (mutation).
We do this for each member in the next generation.
We iterate this process across successive generations.