Deeplearning_vault/Fitted Q-iteration.md at main · Idate96/Deeplearning_vault

A common choice is to choose the lastst policy to sample from the environment.

Before collecting more data you can take K steps. ![[q_iter.png]]