Deep sequential games

Alpha-beta pruning


Alpha-beta pruning

We can use pruning to reduce the number of nodes we search.

Alpha-beta pruning is a method for pruning. For each node we maintain two values:

  • Alpha. The lower bound on the maximum value of the children.

  • Beta. The upper bound on the minimum value of the children.

Consider a player node with a score of \(2\), which has a neighbour. This neighbour has a

We can know for sure we don’t need to explore some paths. for example consider the mini player. there is a node we know the minimax value of a node is 2, and there is a neighbour with one child with a value of 3, then

like DFS, but keep track of:

  • alpha (largest value for max across viewed children)

  • initialise to \(+/-\infty \)

  • initialise to \(\infty \)

Propagate \(\alpha \) and \(\beta \) down during search. prune where \(\alpha \ge \beta \)

Update \(\alpha \) and \(\beta \) by propagating upwards.

Only update alpha on max nodes, beta on min node

Ordering matters. if it’s worst, then no pruning. want an ordering with lots of pruning p can: p + do shallow nodes p + order node so best are done first p + domain knowledge heuristics (chess: capture, threaten, forward, backwards)

In practice can get time down to \(O(b^\frac{m}{2})\).

Use heuristic. deep blue uses \(6000\)

Late Move Reductions (LMR)


If the tree is explored in an efficient order alpha-beta pruning is more effective.

Partial evaluations


Identifying heuristics

Monte-Carlo Tree Search


In search tree, each node has wins/total.

So start with just root in 0/0.


  • Start at root

  • Take n choices to arrive at node which has not been explored (or until w/l state)

  • play randomly from there

  • Back prob up (eg if win, then 1/1 for path back to root, or 0/1 if loss)

Wins/simulation count

So deterministic when choosing paths, up until play randomly.

Upper Confidence bound 1 applied to Trees (UCT)


Way to choose paths

\(\dfrac{w}{n}+c\sqrt {\dfrac{\ln N}{n}}\)

\(w\) is number of wins from node chosen

\(n\) is number of simulations from node chosen

\(N\) is number of simulations that have happened this many layers in

Defining board games

A board is described by:

  • The layout of the board

  • Whose turn it is

  • Move history

These can be represented as nodes the root node being the start of the game, and each branch being a move taken.

The number of possible nodes can quickly become very large, as in chess and go.

Training with reinforcement learning

Reinforcement learning for combinatorial games


Processing visual observations using neural networks


We can use CNNs.

The output can be the values for each action.

Experience replay and catastrophic interference

Training with supervised learning

eg human chess games




Games are different to search. In a search algorithm we are looking for a sequence of actions. In a game we are looking for a reaction function. Unlike in seach, there are other players.

We can use iterative deepening search.


The search space can be too big to look through all the nodes.

Rather than look for win states, we evaluate a non-terminal state using heuristics.

Stochastic games

Can use expectminimax

For max node, return highest expectminimax of children

For min node —

For chance node, average of children weightted

Minimax: two players, max min

Max goes first, maximises results. min minimises results

A node’s minimax value is the best outcome against best player.

To find optimal strategy, depth first search of game tree.

Propagate minimax values up tree as terminal nodes are discovered

If a state is terminal, its value is utility of state

If a state is max node, highest value of children

If a state is min node, lowest value of children

Minimax is optimal, complete (for finite trees)