Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Deep Q-Learning
Only discrete actions,
1. Predict
Q
(
s
t
,
a
)
2. Play action with highest Q value:
a
t
=
arg
max
\mathcal{L}=\frac{1}{2}\sum_B\big(R(s_{t_B},a_{t_B})+\max_a Q(s_{t_{B+1}},a)-Q(s_{t_B},a_{t_B})\big)^2=\frac{1}{2}\sum_B TD_{t_B}(s_{t_B},a_{t_B})^2
References
Deep Reinforcement Learning: Value Functions, DQN, Actor-Critic method, Back-propagation through stochastic functions -- Vishu Vijayan PV