Q-Learning


Cliff world
$$Q(s_t,a_t) \leftarrow Q(s_t,a_t)+\alpha(r_{t+1}+\gamma\max_{a'}Q(s_{t+1},a')-Q(s_t,a_t)))$$

References