Loading [MathJax]/jax/element/mml/optable/BasicLatin.js

Deep Q-Learning


Only discrete actions,

1. Predict Q(st,a)2. Play action with highest Q value: at=argmax \mathcal{L}=\frac{1}{2}\sum_B\big(R(s_{t_B},a_{t_B})+\max_a Q(s_{t_{B+1}},a)-Q(s_{t_B},a_{t_B})\big)^2=\frac{1}{2}\sum_B TD_{t_B}(s_{t_B},a_{t_B})^2

References

  1. Deep Reinforcement Learning: Value Functions, DQN, Actor-Critic method, Back-propagation through stochastic functions -- Vishu Vijayan PV