Value Iteration


One drawback to policy iteration is that each of its iterations involves policy evaluation. Policy evaluation can be truncated by stopping after just one sweep (one backup of each state).

Small world

Steps

Asynchronous DP

A major drawback to the DP methods is that they involve operations over the entire state set of the MDP, i.e. sweeps of the state set.

Asynchronous DP allows some states are backed up several times before the values of others are backed up once. But it still has to back up the values of all the states to converge.

References