A gradient descent Sarsa agent that controls a custom two degrees-of-freedom arm.
- Low memory footprint update implementation
The agent gets to control a two degrees-of-freedom arm. The joints have 155 degrees of rotation. The elbow joint controls a rod tipped with an LED which the agent can toggle on and off. A photo resistor on the surface can detect whether the agent is pointing at it.
The agent must point the LED at the photoresistor in as few actions as possible. Each episode ends when the photocell reads above a threshold, and the agent is reset to a random start position. The agent is penalized for turning on the LED uneccesarily.
To see the details on the implementation and approach, as well as the specification of the reward function, please see the writeup. You can also watch a video of the agent in action.


