|
| 1 | +# Inverted Double Pendulum Example |
| 2 | + |
| 3 | +This example demonstrates using NEAT to evolve a neural network controller for the Gymnasium `InvertedDoublePendulum-v5` environment. |
| 4 | + |
| 5 | +## Problem Description |
| 6 | + |
| 7 | +The inverted double pendulum consists of two poles connected serially and mounted on a cart that moves along a frictionless track. The objective is to balance both poles in an upright position by applying horizontal forces to the cart. |
| 8 | + |
| 9 | +### Environment Details |
| 10 | + |
| 11 | +- **Observation Space**: 9-dimensional continuous vector containing: |
| 12 | + - Position of the cart (x, y, z) |
| 13 | + - Angular positions of both poles (θ1, θ2) |
| 14 | + - Velocities of the cart (ẋ, ẏ, ż) |
| 15 | + - Velocity magnitude at the tip of the second pole |
| 16 | + |
| 17 | +- **Action Space**: 1-dimensional continuous action representing the force applied to the cart (ranging from -1 to 1) |
| 18 | + |
| 19 | +- **Reward**: The agent receives a reward for each timestep the poles remain balanced. The episode terminates when the poles fall beyond a certain angle or the maximum number of steps (1000) is reached. |
| 20 | + |
| 21 | +- **Success Criterion**: A fitness of 9000+ (staying balanced for most/all of the maximum 1000 steps over multiple episodes) |
| 22 | + |
| 23 | +## Files |
| 24 | + |
| 25 | +- `evolve-feedforward.py` - Main evolution script using feedforward networks |
| 26 | +- `config-feedforward` - NEAT configuration file |
| 27 | +- `test-feedforward.py` - Script to test and visualize trained controllers |
| 28 | +- `README.md` - This file |
| 29 | + |
| 30 | +## Requirements |
| 31 | + |
| 32 | +```bash |
| 33 | +pip install neat-python gymnasium |
| 34 | +``` |
| 35 | + |
| 36 | +For visualization of the environment, you may also need: |
| 37 | +```bash |
| 38 | +pip install pygame |
| 39 | +``` |
| 40 | + |
| 41 | +## Usage |
| 42 | + |
| 43 | +### Training a Controller |
| 44 | + |
| 45 | +To evolve a controller from scratch: |
| 46 | + |
| 47 | +```bash |
| 48 | +python evolve-feedforward.py |
| 49 | +``` |
| 50 | + |
| 51 | +This will: |
| 52 | +1. Create a population of 150 random neural networks |
| 53 | +2. Evolve them for up to 300 generations |
| 54 | +3. Use parallel evaluation across all CPU cores |
| 55 | +4. Save checkpoints every 10 generations |
| 56 | +5. Save the best genome as `winner-feedforward.pickle` |
| 57 | +6. Generate visualization files (fitness plots, network diagrams) |
| 58 | + |
| 59 | +The evolution will stop when a genome achieves the fitness threshold (9000.0) or after 300 generations. |
| 60 | + |
| 61 | +### Testing a Trained Controller |
| 62 | + |
| 63 | +To test a trained controller with visualization: |
| 64 | + |
| 65 | +```bash |
| 66 | +python test-feedforward.py |
| 67 | +``` |
| 68 | + |
| 69 | +Or to test a specific genome file: |
| 70 | + |
| 71 | +```bash |
| 72 | +python test-feedforward.py path/to/genome.pickle |
| 73 | +``` |
| 74 | + |
| 75 | +This will run the controller for 5 episodes and display: |
| 76 | +- Real-time visualization of the pendulum |
| 77 | +- Step count and fitness for each episode |
| 78 | +- Average, max, and min fitness across episodes |
| 79 | + |
| 80 | +### Resuming from a Checkpoint |
| 81 | + |
| 82 | +If training is interrupted, you can resume from the most recent checkpoint: |
| 83 | + |
| 84 | +```python |
| 85 | +import neat |
| 86 | +import os |
| 87 | + |
| 88 | +local_dir = os.path.dirname(__file__) |
| 89 | +config_path = os.path.join(local_dir, 'config-feedforward') |
| 90 | +config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction, |
| 91 | + neat.DefaultSpeciesSet, neat.DefaultStagnation, |
| 92 | + config_path) |
| 93 | + |
| 94 | +# Find the most recent checkpoint |
| 95 | +pop = neat.Checkpointer.restore_checkpoint('neat-checkpoint-XX') |
| 96 | +# Continue evolution |
| 97 | +# ... (same as in evolve-feedforward.py) |
| 98 | +``` |
| 99 | + |
| 100 | +## Configuration Notes |
| 101 | + |
| 102 | +The `config-feedforward` file contains several important parameters: |
| 103 | + |
| 104 | +- **Population size**: 150 individuals |
| 105 | +- **Network structure**: Starts with no hidden nodes, allowing NEAT to evolve the topology |
| 106 | +- **Activation functions**: tanh (default), with sigmoid and relu available through mutation |
| 107 | +- **Mutation rates**: Configured to allow both structural and weight mutations |
| 108 | +- **Speciation threshold**: 3.0 for maintaining diversity |
| 109 | + |
| 110 | +These parameters can be tuned based on your computational resources and desired performance. |
| 111 | + |
| 112 | +## Tips for Success |
| 113 | + |
| 114 | +1. **Patience**: This is a challenging control problem. Evolution may take many generations to find good solutions. |
| 115 | + |
| 116 | +2. **Parallel evaluation**: The script uses all available CPU cores by default. Adjust the number of workers in `evolve-feedforward.py` if needed. |
| 117 | + |
| 118 | +3. **Hyperparameter tuning**: If evolution stagnates, try: |
| 119 | + - Increasing population size |
| 120 | + - Adjusting mutation rates |
| 121 | + - Changing the activation functions |
| 122 | + - Modifying the compatibility threshold |
| 123 | + |
| 124 | +4. **Multiple runs**: Due to the stochastic nature of evolution, running multiple independent trials may help find better solutions. |
| 125 | + |
| 126 | +## Expected Results |
| 127 | + |
| 128 | +A successfully evolved controller should: |
| 129 | +- Balance both poles for the full 1000 steps |
| 130 | +- Achieve fitness scores consistently above 9000 |
| 131 | +- Use relatively small networks (often fewer than 10 nodes) |
| 132 | + |
| 133 | +## References |
| 134 | + |
| 135 | +- [NEAT Paper](http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf) |
| 136 | +- [Gymnasium Documentation](https://gymnasium.farama.org/environments/mujoco/inverted_double_pendulum/) |
| 137 | +- [neat-python Documentation](https://neat-python.readthedocs.io/) |
0 commit comments