# Deep sequential games

### Alpha-beta pruning

#### Pruning

Alpha-beta pruning

We can use pruning to reduce the number of nodes we search.

Alpha-beta pruning is a method for pruning. For each node we maintain two values:

• Alpha. The lower bound on the maximum value of the children.

• Beta. The upper bound on the minimum value of the children.

Consider a player node with a score of $$2$$, which has a neighbour. This neighbour has a

We can know for sure we don’t need to explore some paths. for example consider the mini player. there is a node we know the minimax value of a node is 2, and there is a neighbour with one child with a value of 3, then

like DFS, but keep track of:

• alpha (largest value for max across viewed children)

• initialise to $$+/-\infty$$

• initialise to $$\infty$$

Propagate $$\alpha$$ and $$\beta$$ down during search. prune where $$\alpha \ge \beta$$

Update $$\alpha$$ and $$\beta$$ by propagating upwards.

Only update alpha on max nodes, beta on min node

Ordering matters. if it’s worst, then no pruning. want an ordering with lots of pruning p can: p + do shallow nodes p + order node so best are done first p + domain knowledge heuristics (chess: capture, threaten, forward, backwards)

In practice can get time down to $$O(b^\frac{m}{2})$$.

Use heuristic. deep blue uses $$6000$$

### Late Move Reductions (LMR)

#### Introduction

If the tree is explored in an efficient order alpha-beta pruning is more effective.

## Partial evaluations

### Monte-Carlo Tree Search

#### Introduction

In search tree, each node has wins/total.

So start with just root in 0/0.

Algo:

• Start at root

• Take n choices to arrive at node which has not been explored (or until w/l state)

• play randomly from there

• Back prob up (eg if win, then 1/1 for path back to root, or 0/1 if loss)

Wins/simulation count

So deterministic when choosing paths, up until play randomly.

### Upper Confidence bound 1 applied to Trees (UCT)

#### Introduction

Way to choose paths

$$\dfrac{w}{n}+c\sqrt {\dfrac{\ln N}{n}}$$

$$w$$ is number of wins from node chosen

$$n$$ is number of simulations from node chosen

$$N$$ is number of simulations that have happened this many layers in

### Defining board games

A board is described by:

• The layout of the board

• Whose turn it is

• Move history

These can be represented as nodes the root node being the start of the game, and each branch being a move taken.

The number of possible nodes can quickly become very large, as in chess and go.

## Other

### Processing visual observations using neural networks

#### Introduction

We can use CNNs.

The output can be the values for each action.

### Training with supervised learning

eg human chess games

## Other

### Minimax

#### Games

Games are different to search. In a search algorithm we are looking for a sequence of actions. In a game we are looking for a reaction function. Unlike in seach, there are other players.

We can use iterative deepening search.

#### Heuristics

The search space can be too big to look through all the nodes.

Rather than look for win states, we evaluate a non-terminal state using heuristics.

#### Stochastic games

Can use expectminimax

For max node, return highest expectminimax of children

For min node —

For chance node, average of children weightted

Minimax: two players, max min

Max goes first, maximises results. min minimises results

A node’s minimax value is the best outcome against best player.

To find optimal strategy, depth first search of game tree.

Propagate minimax values up tree as terminal nodes are discovered

If a state is terminal, its value is utility of state

If a state is max node, highest value of children

If a state is min node, lowest value of children

Minimax is optimal, complete (for finite trees)