Deep sequential games

Alpha-beta pruning

Pruning

Alpha-beta pruning

We can use pruning to reduce the number of nodes we search.

Alpha-beta pruning is a method for pruning. For each node we maintain two values:

  • Alpha. The lower bound on the maximum value of the children.

  • Beta. The upper bound on the minimum value of the children.

Consider a player node with a score of \(2\), which has a neighbour. This neighbour has a

We can know for sure we don’t need to explore some paths. for example consider the mini player. there is a node we know the minimax value of a node is 2, and there is a neighbour with one child with a value of 3, then

like DFS, but keep track of:

  • alpha (largest value for max across viewed children)

  • initialise to \(+/-\infty \)

  • initialise to \(\infty \)

Propagate \(\alpha \) and \(\beta \) down during search. prune where \(\alpha \ge \beta \)

Update \(\alpha \) and \(\beta \) by propagating upwards.

Only update alpha on max nodes, beta on min node

Ordering matters. if it’s worst, then no pruning. want an ordering with lots of pruning p can: p + do shallow nodes p + order node so best are done first p + domain knowledge heuristics (chess: capture, threaten, forward, backwards)

In practice can get time down to \(O(b^\frac{m}{2})\).

Use heuristic. deep blue uses \(6000\)

Late Move Reductions (LMR)

Introduction

If the tree is explored in an efficient order alpha-beta pruning is more effective.

Partial evaluations

Heuristics

Identifying heuristics

Monte-Carlo Tree Search

Introduction

In search tree, each node has wins/total.

So start with just root in 0/0.

Algo:

  • Start at root

  • Take n choices to arrive at node which has not been explored (or until w/l state)

  • play randomly from there

  • Back prob up (eg if win, then 1/1 for path back to root, or 0/1 if loss)

Wins/simulation count

So deterministic when choosing paths, up until play randomly.

Upper Confidence bound 1 applied to Trees (UCT)

Introduction

Way to choose paths

\(\dfrac{w}{n}+c\sqrt {\dfrac{\ln N}{n}}\)

\(w\) is number of wins from node chosen

\(n\) is number of simulations from node chosen

\(N\) is number of simulations that have happened this many layers in

Defining board games

A board is described by:

  • The layout of the board

  • Whose turn it is

  • Move history

These can be represented as nodes the root node being the start of the game, and each branch being a move taken.

The number of possible nodes can quickly become very large, as in chess and go.

Training with reinforcement learning

Reinforcement learning for combinatorial games

Other

Processing visual observations using neural networks

Introduction

We can use CNNs.

The output can be the values for each action.

Experience replay and catastrophic interference

Training with supervised learning

eg human chess games

Other

Minimax

Games

Games are different to search. In a search algorithm we are looking for a sequence of actions. In a game we are looking for a reaction function. Unlike in seach, there are other players.

We can use iterative deepening search.

Heuristics

The search space can be too big to look through all the nodes.

Rather than look for win states, we evaluate a non-terminal state using heuristics.

Stochastic games

Can use expectminimax

For max node, return highest expectminimax of children

For min node —

For chance node, average of children weightted

Minimax: two players, max min

Max goes first, maximises results. min minimises results

A node’s minimax value is the best outcome against best player.

To find optimal strategy, depth first search of game tree.

Propagate minimax values up tree as terminal nodes are discovered

If a state is terminal, its value is utility of state

If a state is max node, highest value of children

If a state is min node, lowest value of children

Minimax is optimal, complete (for finite trees)