Welcome to the Quadruped Robot Reinforcement Learning Framework, a project that combines cutting-edge reinforcement learning techniques with MuJoCo physics simulation to train quadruped robots for stable and efficient locomotion. This framework allows you to experiment with custom environments, dynamic reward functions, and powerful actor-critic models to achieve state-of-the-art results.
output.mp4
This project focuses on teaching a quadruped robot to walk using reinforcement learning techniques. It leverages the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm within a custom-designed MuJoCo-based environment (QuadroboEnv
).
- Custom OpenAI Gym-compatible environment.
- Dynamic reward function emphasizing forward motion, stability, and energy efficiency.
- TD3 model with tunable parameters.
- Logging, checkpointing, and evaluation for seamless experimentation.
At its core, this project simulates a quadruped robot tasked with navigating a flat terrain. The robot learns to walk, balance, and optimize its gait through continuous interaction with the environment. A carefully designed reward system encourages the robot to move forward efficiently while penalizing energy wastage, unstable orientations, and off-track movements.
The project is well-organized into multiple modules for ease of development and scalability:
docs/
├── custom_env.md # Documentation for the custom environment
├── installation_and_troubleshoot.md # Installation and troubleshooting steps
├── model_definition.md # Details of the TD3 model implementation
├── project_structure.md # Explanation of the project structure
├── reward_design.md # Detailed breakdown of the reward function
└── training_and_evaluation.md # Training and evaluation procedures
Refer to the Project Structure Documentation for a complete breakdown.
Refer to Installation and Troubleshooting Documentation for detailed steps.
Quick start:
- Install Miniconda and create a virtual environment:
conda create -n mujoco_openai python=3.10 conda activate mujoco_openai
- Install dependencies:
pip install -r requirements.txt
- Install the custom environment:
pip install .
Run the training script to teach the robot:
python scripts/train.py
Adjust the parameters in the configs/
directory.
Evaluate a trained model:
python scripts/evaluate.py
Refer to the Training and Evaluation Documentation for more details.
The reward function is designed to:
- Encourage forward motion.
- Penalize sideways drift and unstable orientation.
- Promote energy-efficient gait patterns.
- Reward the robot for maintaining a target velocity and balance.
For a comprehensive explanation, see the Reward Design Documentation.
The QuadroboEnv
environment is the heart of this project. It is a MuJoCo-based environment designed to simulate the physics of a quadruped robot. The environment provides:
- Joint positions, velocities, and forces.
- Orientation feedback (roll, pitch, yaw).
- Contact forces for each leg.
Details are available in the Custom Environment Documentation.
The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is used to train the robot. The model features:
- Actor-critic networks for continuous action spaces.
- Noise injection for exploration.
- Soft updates to stabilize learning.
Read more in the Model Definition Documentation.
We welcome contributions and feedback. Feel free to open issues or submit pull requests to improve this project. 🚀