A CSP problem is one where we don’t care about the path, we just want to identify the goal state.
For example, solving a sudoku
A CSP has:
Variables \(X_i\).
Domain for each variable \(D_i\).
Constraints \(C_j\).
In a CSP there are a range of variables each with a domain. There are on top constraints on combinations of values.
A solution does not violate any constraint.
To solve, start with no allocations of variables. successor function assigns a value to an unassigned variable. goal test
Use heuristic minimum remaining value MRV: choose variable with fewest remaining legal values
Least constraining value: choose item in domain which constrains the least other moves
Forward checking. keep track of remaining legal moves for each variable. terminate if none left
After each move, update legal moves for each
Implement all this with recursive backtrack function, which returns a solution or failure. This is a depth first search
X-Y is arc consistent if all of domain of X is consistnet with some value of Y.
X is node consistent if all of domain satisfies all unary constraint.
Arc consistency for additional variables.
Constraint propagation can be used to prevent bad choices
We can check for:
node-consistency
arc-consistency
path-consistency
AC-3 algorihm makes a CSP arc-consistent
Take all arcs.
It may be possible to break the problem down into sub problems, making the problems much easier to solve.
Can do this before/after other algorithm.
For active learning, only need to update covariance matrix? just needed to select one with highest variance
Active learning is greedy algorithm to reduce entropy
As we get more info, our posterior becomes our new prior
If we can pick observations to use to update model, can use those with biggest variance
Can be useful if getting y is expensive. requires experiement etc
4 steps:
Form \(p(y_0|x_0,y,X)\) for all unmeasured \(x_0\).
Choose \(x_0\) with the largest \(\sigma_0^2\) and observe \(y_0\)
Updated the parameters with this.
repeat
\(\sigma_0^2 =\sigma^2+x_0^T\Sigma x_0\)
Updating Sigma and mu for bayesian linear:
\(\Sigma = (\lambda I + \sigma^{-2}(x_0x_o^T+\sum_{i=1}^nx_ix_i^T))^{-1}\)
\(\mu = (\lambda \sigma^2I+ x_0x_0^T+\sum_{i=1}^nx_ix_i^T)^{-1}(x_0y_0+\sum_{i=1}^nx_iy_i)\)
Once we have an \(x_0\) we can easily get \(\mu_0\) by calculating \(x_0^T\mu\). Multiplying by the mean weights.
We can also get the variance of the estimate:
\(\sigma^2_0=\sigma^2+x_0^T\Sigma\)