Constraint Satisfaction Problem (CSP) and Sudoku

Introduction

A CSP problem is one where we don’t care about the path, we just want to identify the goal state.

For example, solving a sudoku

A CSP has:

In a CSP there are a range of variables each with a domain. There are on top constraints on combinations of values.

A solution does not violate any constraint.

To solve, start with no allocations of variables. successor function assigns a value to an unassigned variable. goal test

Use heuristic minimum remaining value MRV: choose variable with fewest remaining legal values

Least constraining value: choose item in domain which constrains the least other moves

Forward checking. keep track of remaining legal moves for each variable. terminate if none left

After each move, update legal moves for each

Implement all this with recursive backtrack function, which returns a solution or failure. This is a depth first search

X-Y is arc consistent if all of domain of X is consistnet with some value of Y.

X is node consistent if all of domain satisfies all unary constraint.

Arc consistency for additional variables.

Constraint propagation can be used to prevent bad choices

We can check for:

AC-3 algorihm makes a CSP arc-consistent

Take all arcs.

It may be possible to break the problem down into sub problems, making the problems much easier to solve.

Can do this before/after other algorithm.

For active learning, only need to update covariance matrix? just needed to select one with highest variance

Active learning is greedy algorithm to reduce entropy

As we get more info, our posterior becomes our new prior

If we can pick observations to use to update model, can use those with biggest variance

Can be useful if getting y is expensive. requires experiement etc

4 steps:

\(\sigma_0^2 =\sigma^2+x_0^T\Sigma x_0\)

Updating Sigma and mu for bayesian linear:

\(\Sigma = (\lambda I + \sigma^{-2}(x_0x_o^T+\sum_{i=1}^nx_ix_i^T))^{-1}\)

\(\mu = (\lambda \sigma^2I+ x_0x_0^T+\sum_{i=1}^nx_ix_i^T)^{-1}(x_0y_0+\sum_{i=1}^nx_iy_i)\)

Once we have an \(x_0\) we can easily get \(\mu_0\) by calculating \(x_0^T\mu\). Multiplying by the mean weights.

We can also get the variance of the estimate:

\(\sigma^2_0=\sigma^2+x_0^T\Sigma\)