Value Iteration

One drawback to policy iteration is that each of its iterations involves policy evaluation. Policy evaluation can be truncated by stopping after just one sweep (one backup of each state).

Small world

Steps

Initialize values arbitrarily, with \(V_0(s)=0\) for every \(s\).
Successive improvements are based on,
Stop when the value function stops changing / converged,

Asynchronous DP

A major drawback to the DP methods is that they involve operations over the entire state set of the MDP, i.e. sweeps of the state set.

Asynchronous DP allows some states are backed up several times before the values of others are backed up once. But it still has to back up the values of all the states to converge.

Value Iteration

Steps

Asynchronous DP

References