# Reinforcement learning

## Reinforcement learning for Markov Decision Processes

### Unknown MDP transition matrix

### Exploration to understand the MDP

## Temporal difference learning for Markov Decision Processes

### Temporal difference learning

## Q-learning for Markov Decision Processes

### Q-functions

We canâ€™t use value functions, so we use Q-values instead.

\(Q(s,a)\) is the value of taking action \(a\) in state \(s\).

\(Q(s,a)=R_{s,a}+\gamma E_{s'}[max_{a'}Q(s',a')]\).

assigned to every action/state combination), to real. h on q tables, q-values

### Value iteration for Q

### \(\epsilon \)-greedy

## Q-learning for large state Markov Decision Processes

### Function approximation for Q

eg NN from state/action to R

### Quantisation of states

Shrink the number of states.

### End-to-end reinforcement learning

### Deep Q-Network (DQN)

## Reinforcement learning for POMDPs

### Unknown POMDP transition matrix

### Stacking frames

### Deep Recurrent Q-Network