mlagents.tennis.1.mp4
This project is a Unity ML-Agents setup that simulates a tennis environment, where AI-controlled Raquet Agents learn to hit a ball back and forth across a court. The training was performed using ML-Agents Toolkit with CUDA 12.1 acceleration, allowing for efficient reinforcement learning over 10 million steps.
- Custom Tennis Court: The environment is designed to replicate a basic tennis setup with two sides.
- Raquet Agents: Trained using reinforcement learning to perfectly hit the ball back across the court.
- Ball Spawning Mechanism: The ball spawns from the opponent's side and is directed toward the Raquet Agent with a force.
- Multi-Agent Training: We are developing a multi-agent setup where two AI-controlled agents will play against each other instead of a single agent responding to a spawned ball.
- ML-Agents Training: Utilizes Unity ML-Agents with CUDA GPU acceleration for efficient deep reinforcement learning.
Before running this project, ensure you have:
- Unity 2022+ (or a compatible version with ML-Agents support)
- Python 3.8+
- ML-Agents Toolkit (
pip install mlagents) - CUDA 12.1 (for GPU-accelerated training)
- PyTorch with GPU support
git clone https://github.com/yourusername/mlagents-tennis.git
cd mlagents-tennisRun the following command to install required Python dependencies:
pip install -r requirements.txtTo train the agent, navigate to the Unity project directory and run:
mlagents-learn config/fulltennis.yaml --run-id=tennis_agent --env=Build/Tennis --trainThis command will initiate training with 10 million steps, leveraging GPU acceleration. Customize the config file as per your requirements
After training, you can run the trained model using:
mlagents-learn config/trainer_config.yaml --run-id=tennis_agent --env=Build/Tennis --resumeAlternatively, you can load the trained model in Unity and play the simulation directly.
- Algorithm: PPO (Proximal Policy Optimization)
- Training Steps: 10 million
- Observations:
- Ball position and velocity
- Raquet position and rotation
- Court boundaries
- Rewards:
- To successfully hit a ball, there are sequential steps:
- First, track/move along the ball(move towards the ball positive reward, away from the ball negative reward)
- Hit the ball
- Hit it long enough so it gets past the net
- Land it on the other court
- Rewards are given to all four steps this allows for quick feeback, with increasing rewards as you get to next step
- Found out this is the best way to give rewards. The brute force approach which is giving sparse reward just on landing the ball on court is not effective, given this is a continous action space with many possible sets of actions
- To successfully hit a ball, there are sequential steps:
- Action Space: Continuous (for precise movement and hitting)
- We are currently developing a multi-agent version of this environment, where two AI-controlled agents will compete against each other.
- Instead of the ball always spawning from the same side, agents will take turns serving and rallying.
- This setup will allow for more dynamic training and competitive reinforcement learning.
- After 10 million steps, the Raquet Agent achieves a 100% success rate in hitting the ball back.
- The trained agent adapts to different ball speeds and angles.
- Add an AI opponent to create a full two-player game.
- Improve the physics for more realistic ball interactions.
- Implement different shot types (e.g., lobs, spins).
- Expand the multi-agent setup to include doubles matches.
Developed using Unity ML-Agents Toolkit, with CUDA 12.1 acceleration for training.
For any issues or contributions, feel free to open a pull request or create an issue in the repository.