You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem Statement
Suppose $\gamma = 0.5$ and the following sequence of rewards is received $R_1 = -1, R_2 = 2, R_3 = 6, R_4 = 3, R_5 = 2$ with $T=5$. What are $G_0, G_1, ..., G_5$? Hint: Work backwards.
Solution
Since $T$ is defined, we are under an episodic reward/return formulation:
So $G_1 = R_2 + \gamma G_2$, $G_2 = R_3 + \gamma G_3$, $G_3 = R_4 + \gamma G_4$, $G_4 = R_5 + \gamma G_5$ and $G_5 = 0$ since no more rewards can be obtained once the terminal state at $t=5$ has been reached.
Working backwards, we get that $G_4 = 2, G_3 = 3 + 2\gamma = 4, G_2 = 6 + 4\gamma = 8, G_1 = 2 + 8\gamma = 6, G_0 = -1 + 6\gamma = 2$.