Reinforcement learning for Markov Decision Processes
Unknown MDP transition matrix
Exploration to understand the MDP
Temporal difference learning for Markov Decision Processes
Temporal difference learning
Q-learning for Markov Decision Processes
We can’t use value functions, so we use Q-values instead.
\(Q(s,a)\) is the value of taking action \(a\) in state \(s\).
assigned to every action/state combination), to real. h on q tables, q-values
Value iteration for Q
Q-learning for large state Markov Decision Processes
Function approximation for Q
eg NN from state/action to R
Quantisation of states
Shrink the number of states.
End-to-end reinforcement learning
Deep Q-Network (DQN)
Reinforcement learning for POMDPs
Unknown POMDP transition matrix
Deep Recurrent Q-Network