This project implements a Multi-Agent Proximal Policy Optimization (MAPPO) algorithm to solve inverse kinematics problems for robotic arms. The system uses PyBullet for physics simulation and PyTorch for deep learning.
- Custom OpenAI Gym environment
- Simulates a robotic arm (default: KUKA IIWA) in PyBullet
- Handles state observations, action applications, and reward calculations
- Implements the MAPPO algorithm
- Manages multiple agents, one for each joint of the robotic arm
- Uses a centralized critic and decentralized actors
- Predicts action mean and standard deviation for each joint
- Uses tanh activation for bounded actions
- Estimates the value function for the entire state
- Episodic training loop
- Collects trajectories and updates policy using PPO
- Implements Generalized Advantage Estimation (GAE)
- Uses separate learning rates and clip parameters for each joint
- Tracks various performance metrics during training
- Generates plots and saves logs for analysis
- Multi-agent approach for controlling individual joints
- Centralized training with decentralized execution
- Dynamic difficulty adjustment during training
- Best model saving based on joint error performance
- Comprehensive logging and visualization of training metrics
- Configure the environment and training parameters in
config.py
- Run the main training script: main.py
- Monitor training progress through logged metrics and generated plots
- Use the trained model for testing or deployment
- PyTorch
- PyBullet
- OpenAI Gym
- NumPy
- Matplotlib
- Implement more advanced exploration strategies
- Add support for different robot models
- Optimize hyperparameters for better performance
- Implement multi-task learning for various IK problems