Twin-Delayed Deep Deterministic Policy Gradient (DDPG)


Typical deep q-learning model is not enough for predicting continuous action space. Therefore, twin-delayed DDPG is introduced.

Actor Crtic

Actor: \(\mathcal{S}\rightarrow\mathcal{A}\)

Critic: \(\mathcal{S},\mathcal{A}\rightarrow Q\)

Policy Gradient

When one pair of actor critic is introduced, there is a problem of approximating Q-values.

DNN

References

  1. Jon Michaux Off-Policy Actor-Critic Algorithms