Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 617 Bytes

File metadata and controls

13 lines (8 loc) · 617 Bytes

Exercise 3.25 - Optimal state-value in terms of optimal action-value

Problem Statement Give an equation for $v_$ in terms of $q_$.

Solution

We know from this chapter that

$$v__(s) = \max\limits_{a \in \mathcal{A}(s)} q_{\pi__}(s, a) \quad \forall s \in \mathcal{S}$$

But $q_* \doteq q_{\pi_*}$ – this is just a difference in notation. Both refer to the optimal action-value function, that is, the expected return for taking action $a$ in state $s$ and thereafter following an optimal policy.

$$\therefore \boxed{v__(s) = \max\limits_{a \in \mathcal{A}(s)} q__(s,a) \quad \forall s \in \mathcal{S}}$$