You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$q_\pi$ is the expected future return given a particular (state, action) pair under policy $\pi$. $v_\pi$ is the expected future return given a particular state under policy $\pi$ and averages over all possible actions the agent may choose from in state $S_t = s$. So if we average the action-value function across all possible actions $A_t$, weighing each term by the probability that action is chosen $\pi(a | s)$ then we obtain $v_\pi$.