Reinforcement learning
Reinforcement
learning for Markov Decision Processes
Unknown MDP transition
matrix
Exploration to understand the
MDP
Temporal
difference learning for Markov Decision Processes
Temporal difference learning
Q-learning for Markov
Decision Processes
Q-functions
We can’t use value functions, so we use Q-values instead.
\(Q(s,a)\) is the value of taking
action \(a\) in state \(s\).
\(Q(s,a)=R_{s,a}+\gamma
E_{s'}[max_{a'}Q(s',a')]\).
assigned to every action/state combination), to real. h on q tables,
q-values
Value iteration for Q
\(\epsilon\)-greedy
Q-learning
for large state Markov Decision Processes
Function approximation for Q
eg NN from state/action to R
Quantisation of states
Shrink the number of states.
End-to-end reinforcement
learning
Deep Q-Network (DQN)
Reinforcement learning for
POMDPs
Unknown POMDP transition
matrix
Stacking frames
Deep Recurrent Q-Network