Alpha-beta pruning
We can use pruning to reduce the number of nodes we search.
Alpha-beta pruning is a method for pruning. For each node we maintain two values:
Alpha. The lower bound on the maximum value of the children.
Beta. The upper bound on the minimum value of the children.
Consider a player node with a score of \(2\), which has a neighbour. This neighbour has a
We can know for sure we don’t need to explore some paths. for example consider the mini player. there is a node we know the minimax value of a node is 2, and there is a neighbour with one child with a value of 3, then
like DFS, but keep track of:
alpha (largest value for max across viewed children)
initialise to \(+/-\infty\)
initialise to \(\infty\)
Propagate \(\alpha\) and \(\beta\) down during search. prune where \(\alpha \ge \beta\)
Update \(\alpha\) and \(\beta\) by propagating upwards.
Only update alpha on max nodes, beta on min node
Ordering matters. if it’s worst, then no pruning. want an ordering with lots of pruning p can: p + do shallow nodes p + order node so best are done first p + domain knowledge heuristics (chess: capture, threaten, forward, backwards)
In practice can get time down to \(O(b^\frac{m}{2})\).
Use heuristic. deep blue uses \(6000\)
If the tree is explored in an efficient order alpha-beta pruning is more effective.
In search tree, each node has wins/total.
So start with just root in 0/0.
Algo:
Start at root
Take n choices to arrive at node which has not been explored (or until w/l state)
play randomly from there
Back prob up (eg if win, then 1/1 for path back to root, or 0/1 if loss)
Wins/simulation count
So deterministic when choosing paths, up until play randomly.
Way to choose paths
\(\dfrac{w}{n}+c\sqrt {\dfrac{\ln N}{n}}\)
\(w\) is number of wins from node chosen
\(n\) is number of simulations from node chosen
\(N\) is number of simulations that have happened this many layers in
A board is described by:
The layout of the board
Whose turn it is
Move history
These can be represented as nodes the root node being the start of the game, and each branch being a move taken.
The number of possible nodes can quickly become very large, as in chess and go.
We can use CNNs.
The output can be the values for each action.
eg human chess games
Games are different to search. In a search algorithm we are looking for a sequence of actions. In a game we are looking for a reaction function. Unlike in seach, there are other players.
We can use iterative deepening search.
The search space can be too big to look through all the nodes.
Rather than look for win states, we evaluate a non-terminal state using heuristics.
Can use expectminimax
For max node, return highest expectminimax of children
For min node —
For chance node, average of children weightted
Minimax: two players, max min
Max goes first, maximises results. min minimises results
A node’s minimax value is the best outcome against best player.
To find optimal strategy, depth first search of game tree.
Propagate minimax values up tree as terminal nodes are discovered
If a state is terminal, its value is utility of state
If a state is max node, highest value of children
If a state is min node, lowest value of children
Minimax is optimal, complete (for finite trees)