unitree_rl_gym/README.md

<div align="center">
  <h1 align="center">Unitree RL GYM</h1>
  <p align="center">
    <span> 🌎English </span> | <a href="README_zh.md"> 🇨🇳中文 </a>
  </p>
</div>

<p align="center">
  <strong>This is a repository for reinforcement learning implementation based on Unitree robots, supporting Unitree Go2, H1, H1_2, and G1.</strong> 
</p>

<div align="center">

| <div align="center"> Isaac Gym </div> | <div align="center">  Mujoco </div> |  <div align="center"> Physical </div> |
|--- | --- | --- |
| [<img src="https://oss-global-cdn.unitree.com/static/32f06dc9dfe4452dac300dda45e86b34.GIF" width="240px">](https://oss-global-cdn.unitree.com/static/5bbc5ab1d551407080ca9d58d7bec1c8.mp4) | [<img src="https://oss-global-cdn.unitree.com/static/244cd5c4f823495fbfb67ef08f56aa33.GIF" width="240px">](https://oss-global-cdn.unitree.com/static/5aa48535ffd641e2932c0ba45c8e7854.mp4) | [<img src="https://oss-global-cdn.unitree.com/static/78c61459d3ab41448cfdb31f6a537e8b.GIF" width="240px">](https://oss-global-cdn.unitree.com/static/0818dcf7a6874b92997354d628adcacd.mp4) |

</div>

---

## 📦 Installation and Configuration

Please refer to [setup.md](/doc/setup_en.md) for installation and configuration steps.

## 🔁 Process Overview

The basic workflow for using reinforcement learning to achieve motion control is:

`Train` → `Play` → `Sim2Sim` → `Sim2Real`

- **Train**: Use the Gym simulation environment to let the robot interact with the environment and find a policy that maximizes the designed rewards. Real-time visualization during training is not recommended to avoid reduced efficiency.
- **Play**: Use the Play command to verify the trained policy and ensure it meets expectations.
- **Sim2Sim**: Deploy the Gym-trained policy to other simulators to ensure it’s not overly specific to Gym characteristics.
- **Sim2Real**: Deploy the policy to a physical robot to achieve motion control.

## 🛠️ User Guide

### 1. Training

Run the following command to start training:

```bash
python legged_gym/scripts/train.py --task=xxx
```

#### ⚙️ Parameter Description
- `--task`: Required parameter; values can be (go2, g1, h1, h1_2).
- `--headless`: Defaults to starting with a graphical interface; set to true for headless mode (higher efficiency).
- `--resume`: Resume training from a checkpoint in the logs.
- `--experiment_name`: Name of the experiment to run/load.
- `--run_name`: Name of the run to execute/load.
- `--load_run`: Name of the run to load; defaults to the latest run.
- `--checkpoint`: Checkpoint number to load; defaults to the latest file.
- `--num_envs`: Number of environments for parallel training.
- `--seed`: Random seed.
- `--max_iterations`: Maximum number of training iterations.
- `--sim_device`: Simulation computation device; specify CPU as `--sim_device=cpu`.
- `--rl_device`: Reinforcement learning computation device; specify CPU as `--rl_device=cpu`.

**Default Training Result Directory**: `logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt`

---

### 2. Play

To visualize the training results in Gym, run the following command:

```bash
python legged_gym/scripts/play.py --task=xxx
```

**Description**:

- Play’s parameters are the same as Train’s.
- By default, it loads the latest model from the experiment folder’s last run.
- You can specify other models using `load_run` and `checkpoint`.

#### 💾 Export Network

Play exports the Actor network, saving it in `logs/{experiment_name}/exported/policies`:
- Standard networks (MLP) are exported as `policy_1.pt`.
- RNN networks are exported as `policy_lstm_1.pt`.

### Play Results

| Go2 | G1 | H1 | H1_2 |
|--- | --- | --- | --- |
| [![go2](https://oss-global-cdn.unitree.com/static/ba006789e0af4fe3867255f507032cd7.GIF)](https://oss-global-cdn.unitree.com/static/d2e8da875473457c8d5d69c3de58b24d.mp4) | [![g1](https://oss-global-cdn.unitree.com/static/32f06dc9dfe4452dac300dda45e86b34.GIF)](https://oss-global-cdn.unitree.com/static/5bbc5ab1d551407080ca9d58d7bec1c8.mp4) | [![h1](https://oss-global-cdn.unitree.com/static/fa04e73966934efa9838e9c389f48fa2.GIF)](https://oss-global-cdn.unitree.com/static/522128f4640c4f348296d2761a33bf98.mp4) |[![h1_2](https://oss-global-cdn.unitree.com/static/83ed59ca0dab4a51906aff1f93428650.GIF)](https://oss-global-cdn.unitree.com/static/15fa46984f2343cb83342fd39f5ab7b2.mp4)|

---

### 3. Sim2Sim (Mujoco)

Run Sim2Sim in the Mujoco simulator:

```bash
python deploy/deploy_mujoco/deploy_mujoco.py {config_name}
```

#### Parameter Description
- `config_name`: Configuration file; default search path is `deploy/deploy_mujoco/configs/`.

#### Example: Running G1

```bash
python deploy/deploy_mujoco/deploy_mujoco.py g1.yaml
```

#### ➡️ Replace Network Model

The default model is located at `deploy/pre_train/{robot}/motion.pt`; custom-trained models are saved in `logs/g1/exported/policies/policy_lstm_1.pt`. Update the `policy_path` in the YAML configuration file accordingly.

#### Simulation Results

| G1 | H1 | H1_2 |
|--- | --- | --- |
| [![mujoco_g1](https://oss-global-cdn.unitree.com/static/244cd5c4f823495fbfb67ef08f56aa33.GIF)](https://oss-global-cdn.unitree.com/static/5aa48535ffd641e2932c0ba45c8e7854.mp4)  |  [![mujoco_h1](https://oss-global-cdn.unitree.com/static/7ab4e8392e794e01b975efa205ef491e.GIF)](https://oss-global-cdn.unitree.com/static/8934052becd84d08bc8c18c95849cf32.mp4)  |  [![mujoco_h1_2](https://oss-global-cdn.unitree.com/static/2905e2fe9b3340159d749d5e0bc95cc4.GIF)](https://oss-global-cdn.unitree.com/static/ee7ee85bd6d249989a905c55c7a9d305.mp4) |


---

### 4. Sim2Real (Physical Deployment)

Before deploying to the physical robot, ensure it’s in debug mode. Detailed steps can be found in the [Physical Deployment Guide](deploy/deploy_real/README.md):

```bash
python deploy/deploy_real/deploy_real.py {net_interface} {config_name}
```

#### Parameter Description
- `net_interface`: Network card name connected to the robot, e.g., `enp3s0`.
- `config_name`: Configuration file located in `deploy/deploy_real/configs/`, e.g., `g1.yaml`, `h1.yaml`, `h1_2.yaml`.

#### Deployment Results

| G1 | H1 | H1_2 |
|--- | --- | --- |
| [![real_g1](https://oss-global-cdn.unitree.com/static/78c61459d3ab41448cfdb31f6a537e8b.GIF)](https://oss-global-cdn.unitree.com/static/0818dcf7a6874b92997354d628adcacd.mp4) | [![real_h1](https://oss-global-cdn.unitree.com/static/fa07b2fd2ad64bb08e6b624d39336245.GIF)](https://oss-global-cdn.unitree.com/static/ea0084038d384e3eaa73b961f33e6210.mp4) | [![real_h1_2](https://oss-global-cdn.unitree.com/static/a88915e3523546128a79520aa3e20979.GIF)](https://oss-global-cdn.unitree.com/static/12d041a7906e489fae79d55b091a63dd.mp4) |

---

## 🎉 Acknowledgments

This repository is built upon the support and contributions of the following open-source projects. Special thanks to:

- [legged\_gym](https://github.com/leggedrobotics/legged_gym): The foundation for training and running codes.
- [rsl\_rl](https://github.com/leggedrobotics/rsl_rl.git): Reinforcement learning algorithm implementation.
- [mujoco](https://github.com/google-deepmind/mujoco.git): Providing powerful simulation functionalities.
- [unitree\_sdk2\_python](https://github.com/unitreerobotics/unitree_sdk2_python.git): Hardware communication interface for physical deployment.

---

## 🔖 License

This project is licensed under the [BSD 3-Clause License](./LICENSE):
1. The original copyright notice must be retained.
2. The project name or organization name may not be used for promotion.
3. Any modifications must be disclosed.

For details, please read the full [LICENSE file](./LICENSE).
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								<div align="center">
 								  <h1 align="center">Unitree RL GYM</h1>
 								  <p align="center">
 								    <span> 🌎English </span> | <a href="README_zh.md"> 🇨🇳中文 </a>
 								  </p>
 								</div>
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								<p align="center">
 								  <strong>This is a repository for reinforcement learning implementation based on Unitree robots, supporting Unitree Go2, H1, H1_2, and G1.</strong>
 								</p>
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								<div align="center">
 								| <div align="center"> Isaac Gym </div> | <div align="center">  Mujoco </div> |  <div align="center"> Physical </div> |
-												update readme

											
										
										
											2024-12-11 20:30:19 +08:00
+								|--- | --- | --- |
 								| [<img src="https://oss-global-cdn.unitree.com/static/32f06dc9dfe4452dac300dda45e86b34.GIF" width="240px">](https://oss-global-cdn.unitree.com/static/5bbc5ab1d551407080ca9d58d7bec1c8.mp4) | [<img src="https://oss-global-cdn.unitree.com/static/244cd5c4f823495fbfb67ef08f56aa33.GIF" width="240px">](https://oss-global-cdn.unitree.com/static/5aa48535ffd641e2932c0ba45c8e7854.mp4) | [<img src="https://oss-global-cdn.unitree.com/static/78c61459d3ab41448cfdb31f6a537e8b.GIF" width="240px">](https://oss-global-cdn.unitree.com/static/0818dcf7a6874b92997354d628adcacd.mp4) |
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								</div>
 								---
 								## 📦 Installation and Configuration
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								Please refer to [setup.md](/doc/setup_en.md) for installation and configuration steps.
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								## 🔁 Process Overview
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								The basic workflow for using reinforcement learning to achieve motion control is:
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								`Train` → `Play` → `Sim2Sim` → `Sim2Real`
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								- **Train**: Use the Gym simulation environment to let the robot interact with the environment and find a policy that maximizes the designed rewards. Real-time visualization during training is not recommended to avoid reduced efficiency.
 								- **Play**: Use the Play command to verify the trained policy and ensure it meets expectations.
 								- **Sim2Sim**: Deploy the Gym-trained policy to other simulators to ensure it’s not overly specific to Gym characteristics.
 								- **Sim2Real**: Deploy the policy to a physical robot to achieve motion control.
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								## 🛠️ User Guide
-												Update README

											
										
										
											2024-07-25 10:29:43 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								### 1. Training
-												Update README

											
										
										
											2024-07-25 10:29:43 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								Run the following command to start training:
-												Deploy on physical robots

											
										
										
											2024-12-06 16:05:29 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								```bash
 								python legged_gym/scripts/train.py --task=xxx
 								```
-												Deploy on physical robots

											
										
										
											2024-12-06 16:05:29 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								#### ⚙️ Parameter Description
 								- `--task`: Required parameter; values can be (go2, g1, h1, h1_2).
 								- `--headless`: Defaults to starting with a graphical interface; set to true for headless mode (higher efficiency).
 								- `--resume`: Resume training from a checkpoint in the logs.
 								- `--experiment_name`: Name of the experiment to run/load.
 								- `--run_name`: Name of the run to execute/load.
 								- `--load_run`: Name of the run to load; defaults to the latest run.
 								- `--checkpoint`: Checkpoint number to load; defaults to the latest file.
 								- `--num_envs`: Number of environments for parallel training.
 								- `--seed`: Random seed.
 								- `--max_iterations`: Maximum number of training iterations.
 								- `--sim_device`: Simulation computation device; specify CPU as `--sim_device=cpu`.
 								- `--rl_device`: Reinforcement learning computation device; specify CPU as `--rl_device=cpu`.
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								**Default Training Result Directory**: `logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt`
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								---
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								### 2. Play
 								To visualize the training results in Gym, run the following command:
 								```bash
 								python legged_gym/scripts/play.py --task=xxx
 								```
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								**Description**:
 								- Play’s parameters are the same as Train’s.
 								- By default, it loads the latest model from the experiment folder’s last run.
 								- You can specify other models using `load_run` and `checkpoint`.
 								#### 💾 Export Network
 								Play exports the Actor network, saving it in `logs/{experiment_name}/exported/policies`:
 								- Standard networks (MLP) are exported as `policy_1.pt`.
 								- RNN networks are exported as `policy_lstm_1.pt`.
 								### Play Results
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												Deploy on physical robots

											
										
										
											2024-12-06 16:05:29 +08:00
+								| Go2 | G1 | H1 | H1_2 |
 								|--- | --- | --- | --- |
-												update readme

											
										
										
											2024-12-11 20:30:19 +08:00
+								| [![go2](https://oss-global-cdn.unitree.com/static/ba006789e0af4fe3867255f507032cd7.GIF)](https://oss-global-cdn.unitree.com/static/d2e8da875473457c8d5d69c3de58b24d.mp4) | [![g1](https://oss-global-cdn.unitree.com/static/32f06dc9dfe4452dac300dda45e86b34.GIF)](https://oss-global-cdn.unitree.com/static/5bbc5ab1d551407080ca9d58d7bec1c8.mp4) | [![h1](https://oss-global-cdn.unitree.com/static/fa04e73966934efa9838e9c389f48fa2.GIF)](https://oss-global-cdn.unitree.com/static/522128f4640c4f348296d2761a33bf98.mp4) |[![h1_2](https://oss-global-cdn.unitree.com/static/83ed59ca0dab4a51906aff1f93428650.GIF)](https://oss-global-cdn.unitree.com/static/15fa46984f2343cb83342fd39f5ab7b2.mp4)|
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								---
-												update readme

											
										
										
											2024-07-18 14:42:14 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								### 3. Sim2Sim (Mujoco)
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								Run Sim2Sim in the Mujoco simulator:
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
 								```bash
-												Deploy on physical robots

											
										
										
											2024-12-06 16:05:29 +08:00
+								python deploy/deploy_mujoco/deploy_mujoco.py {config_name}
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
+								```
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								#### Parameter Description
 								- `config_name`: Configuration file; default search path is `deploy/deploy_mujoco/configs/`.
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								#### Example: Running G1
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
 								```bash
-												Deploy on physical robots

											
										
										
											2024-12-06 16:05:29 +08:00
+								python deploy/deploy_mujoco/deploy_mujoco.py g1.yaml
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
+								```
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								#### ➡️ Replace Network Model
 								The default model is located at `deploy/pre_train/{robot}/motion.pt`; custom-trained models are saved in `logs/g1/exported/policies/policy_lstm_1.pt`. Update the `policy_path` in the YAML configuration file accordingly.
 								#### Simulation Results
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
-												Deploy on physical robots

											
										
										
											2024-12-06 16:05:29 +08:00
+								| G1 | H1 | H1_2 |
 								|--- | --- | --- |
-												update readme

											
										
										
											2024-12-11 20:30:19 +08:00
+								| [![mujoco_g1](https://oss-global-cdn.unitree.com/static/244cd5c4f823495fbfb67ef08f56aa33.GIF)](https://oss-global-cdn.unitree.com/static/5aa48535ffd641e2932c0ba45c8e7854.mp4)  |  [![mujoco_h1](https://oss-global-cdn.unitree.com/static/7ab4e8392e794e01b975efa205ef491e.GIF)](https://oss-global-cdn.unitree.com/static/8934052becd84d08bc8c18c95849cf32.mp4)  |  [![mujoco_h1_2](https://oss-global-cdn.unitree.com/static/2905e2fe9b3340159d749d5e0bc95cc4.GIF)](https://oss-global-cdn.unitree.com/static/ee7ee85bd6d249989a905c55c7a9d305.mp4) |
-												add the sim2sim code on mujoco

											
										
										
											2024-11-29 17:00:30 +08:00
-												update readme for detail

											
										
										
											2024-12-25 12:07:22 +08:00
+								---
 								### 4. Sim2Real (Physical Deployment)
 								Before deploying to the physical robot, ensure it’s in debug mode. Detailed steps can be found in the [Physical Deployment Guide](deploy/deploy_real/README.md):
 								```bash
 								python deploy/deploy_real/deploy_real.py {net_interface} {config_name}
 								```
 								#### Parameter Description
 								- `net_interface`: Network card name connected to the robot, e.g., `enp3s0`.
 								- `config_name`: Configuration file located in `deploy/deploy_real/configs/`, e.g., `g1.yaml`, `h1.yaml`, `h1_2.yaml`.
 								#### Deployment Results
 								| G1 | H1 | H1_2 |
 								|--- | --- | --- |
 								| [![real_g1](https://oss-global-cdn.unitree.com/static/78c61459d3ab41448cfdb31f6a537e8b.GIF)](https://oss-global-cdn.unitree.com/static/0818dcf7a6874b92997354d628adcacd.mp4) | [![real_h1](https://oss-global-cdn.unitree.com/static/fa07b2fd2ad64bb08e6b624d39336245.GIF)](https://oss-global-cdn.unitree.com/static/ea0084038d384e3eaa73b961f33e6210.mp4) | [![real_h1_2](https://oss-global-cdn.unitree.com/static/a88915e3523546128a79520aa3e20979.GIF)](https://oss-global-cdn.unitree.com/static/12d041a7906e489fae79d55b091a63dd.mp4) |
 								---
 								## 🎉 Acknowledgments
 								This repository is built upon the support and contributions of the following open-source projects. Special thanks to:
 								- [legged\_gym](https://github.com/leggedrobotics/legged_gym): The foundation for training and running codes.
 								- [rsl\_rl](https://github.com/leggedrobotics/rsl_rl.git): Reinforcement learning algorithm implementation.
 								- [mujoco](https://github.com/google-deepmind/mujoco.git): Providing powerful simulation functionalities.
 								- [unitree\_sdk2\_python](https://github.com/unitreerobotics/unitree_sdk2_python.git): Hardware communication interface for physical deployment.
 								---
 								## 🔖 License
 								This project is licensed under the [BSD 3-Clause License](./LICENSE):
 . The original copyright notice must be retained.
 . The project name or organization name may not be used for promotion.
 . Any modifications must be disclosed.
 								For details, please read the full [LICENSE file](./LICENSE).