Reinforcement learning

Reinforcement learning for Markov Decision Processes

Unknown MDP transition matrix

Exploration to understand the MDP

Temporal difference learning for Markov Decision Processes

Temporal difference learning

Q-learning for Markov Decision Processes

Q-functions

We can’t use value functions, so we use Q-values instead.

\(Q(s,a)\) is the value of taking action \(a\) in state \(s\).

\(Q(s,a)=R_{s,a}+\gamma E_{s'}[max_{a'}Q(s',a')]\).

assigned to every action/state combination), to real. h on q tables, q-values

Value iteration for Q

\(\epsilon \)-greedy

Q-learning for large state Markov Decision Processes

Function approximation for Q

eg NN from state/action to R

Quantisation of states

Shrink the number of states.

End-to-end reinforcement learning

Deep Q-Network (DQN)

Reinforcement learning for POMDPs

Unknown POMDP transition matrix

Stacking frames

Deep Recurrent Q-Network