This link explaining it very well how this is working
- https://stackoverflow.com/questions/46422845/what-is-the-way-to-understand-proximal-policy-optimization-algorithm-in-rl
- https://keras.io/examples/rl/ppo_cartpole/
- https://openai.com/blog/openai-baselines-ppo/
- https://blogs.oracle.com/ai-and-datascience/post/reinforcement-learning-proximal-policy-optimization-ppo

