Typical deep q-learning model is not enough for predicting continuous action space. Therefore, twin-delayed DDPG is introduced.
Actor: \(\mathcal{S}\rightarrow\mathcal{A}\)
Critic: \(\mathcal{S},\mathcal{A}\rightarrow Q\)
When one pair of actor critic is introduced, there is a problem of approximating Q-values.