From e67da1d7a665622c89d32cd2a58e3b4cc5fd6f4a Mon Sep 17 00:00:00 2001
From: Alexander Soare <alexander.soare159@gmail.com>
Date: Tue, 21 May 2024 16:47:49 +0100
Subject: [PATCH] Add tutorials for using the training script and  (#196)

Co-authored-by: Remi <re.cadene@gmail.com>
---
 Makefile                                      |  19 ++
 README.md                                     |  18 +-
 examples/4_train_policy_with_script.md        | 165 ++++++++++++++++++
 .../advanced/1_train_act_pusht/act_pusht.yaml |  87 +++++++++
 .../1_train_act_pusht/train_act_pusht.md      |  70 ++++++++
 .../2_calculate_validation_loss.py}           |   0
 tests/test_examples.py                        |   6 +-
 7 files changed, 360 insertions(+), 5 deletions(-)
 create mode 100644 examples/4_train_policy_with_script.md
 create mode 100644 examples/advanced/1_train_act_pusht/act_pusht.yaml
 create mode 100644 examples/advanced/1_train_act_pusht/train_act_pusht.md
 rename examples/{4_calculate_validation_loss.py => advanced/2_calculate_validation_loss.py} (100%)

diff --git a/Makefile b/Makefile
index 9a8a2474..2a32048e 100644
--- a/Makefile
+++ b/Makefile
@@ -27,6 +27,7 @@ test-end-to-end:
 	${MAKE} test-tdmpc-ete-train
 	${MAKE} test-tdmpc-ete-eval
 	${MAKE} test-default-ete-eval
+	${MAKE} test-act-pusht-tutorial
 
 test-act-ete-train:
 	python lerobot/scripts/train.py \
@@ -142,3 +143,21 @@ test-default-ete-eval:
 		eval.batch_size=1 \
 		env.episode_length=8 \
 		device=cpu \
+
+
+test-act-pusht-tutorial:
+	cp examples/advanced/1_train_act_pusht/act_pusht.yaml lerobot/configs/policy/created_by_Makefile.yaml
+	python lerobot/scripts/train.py \
+		policy=created_by_Makefile.yaml \
+		env=pusht \
+		wandb.enable=False \
+		training.offline_steps=2 \
+		eval.n_episodes=1 \
+		eval.batch_size=1 \
+		env.episode_length=2 \
+		device=cpu \
+		training.save_model=true \
+		training.save_freq=2 \
+		training.batch_size=2 \
+		hydra.run.dir=tests/outputs/act_pusht/
+	rm lerobot/configs/policy/created_by_Makefile.yaml
diff --git a/README.md b/README.md
index e71b0e67..1019f234 100644
--- a/README.md
+++ b/README.md
@@ -99,6 +99,7 @@ wandb login
 ```
 .
 ├── examples             # contains demonstration examples, start here to learn about LeRobot
+|   └── advanced         # contains even more examples for those who have mastered the basics
 ├── lerobot
 |   ├── configs          # contains hydra yaml files with all options that you can override in the command line
 |   |   ├── default.yaml   # selected by default, it loads pusht environment and diffusion policy
@@ -158,9 +159,10 @@ See `python lerobot/scripts/eval.py --help` for more instructions.
 
 ### Train your own policy
 
-Check out [example 3](./examples/3_train_policy.py) that illustrates how to start training a model.
+Check out [example 3](./examples/3_train_policy.py) that illustrates how to train a model using our core library in python, and [example 4](./examples/4_train_policy_with_script.md) that shows how to use our training script from command line.
 
 In general, you can use our training script to easily train any policy. Here is an example of training the ACT policy on trajectories collected by humans on the Aloha simulation environment for the insertion task:
+
 ```bash
 python lerobot/scripts/train.py \
     policy=act \
@@ -184,7 +186,19 @@ A link to the wandb logs for the run will also show up in yellow in your termina
 
 ![](media/wandb.png)
 
-Note: For efficiency, during training every checkpoint is evaluated on a low number of episodes. After training, you may want to re-evaluate your best checkpoints on more episodes or change the evaluation settings. See `python lerobot/scripts/eval.py --help` for more instructions.
+Note: For efficiency, during training every checkpoint is evaluated on a low number of episodes. You may use `eval.n_episodes=500` to evaluate on more episodes than the default. Or, after training, you may want to re-evaluate your best checkpoints on more episodes or change the evaluation settings. See `python lerobot/scripts/eval.py --help` for more instructions.
+
+#### Reproduce state-of-the-art (SOTA)
+
+We have organized our configuration files (found under [`lerobot/configs`](./lerobot/configs)) such that they reproduce SOTA results from a given model variant in their respective original works. Simply running:
+
+```bash
+python lerobot/scripts/train.py policy=diffusion env=pusht
+```
+
+reproduces SOTA results for Diffusion Policy on the PushT task.
+
+Pretrained policies, along with reproduction details, can be found under the "Models" section of https://huggingface.co/lerobot.
 
 ## Contribute
 
diff --git a/examples/4_train_policy_with_script.md b/examples/4_train_policy_with_script.md
new file mode 100644
index 00000000..baa1f1c9
--- /dev/null
+++ b/examples/4_train_policy_with_script.md
@@ -0,0 +1,165 @@
+This tutorial will explain the training script, how to use it, and particularly the use of Hydra to configure everything needed for the training run.
+
+## The training script
+
+LeRobot offers a training script at [`lerobot/scripts/train.py`](../../lerobot/scripts/train.py). At a high level it does the following:
+
+- Loads a Hydra configuration file for the following steps (more on Hydra in a moment).
+- Makes a simulation environment.
+- Makes a dataset corresponding to that simulation environment.
+- Makes a policy.
+- Runs a standard training loop with forward pass, backward pass, optimization step, and occasional logging, evaluation (of the policy on the environment), and checkpointing.
+
+## Our use of Hydra
+
+Explaining the ins and outs of [Hydra](https://hydra.cc/docs/intro/) is beyond the scope of this document, but here we'll share the main points you need to know.
+
+First, `lerobot/configs` has a directory structure like this:
+
+```
+.
+├── default.yaml
+├── env
+│   ├── aloha.yaml
+│   ├── pusht.yaml
+│   └── xarm.yaml
+└── policy
+    ├── act.yaml
+    ├── diffusion.yaml
+    └── tdmpc.yaml
+```
+
+**_For brevity, in the rest of this document we'll drop the leading `lerobot/configs` path. So `default.yaml` really refers to `lerobot/configs/default.yaml`._**
+
+When you run the training script with
+
+```python
+python lerobot/scripts/train.py
+```
+
+Hydra is set up to read `default.yaml` (via the `@hydra.main` decorator). If you take a look at the `@hydra.main`'s arguments you will see `config_path="../configs", config_name="default"`. At the top of `default.yaml`, is a `defaults` section which looks likes this:
+
+```yaml
+defaults:
+  - _self_
+  - env: pusht
+  - policy: diffusion
+```
+
+This logic tells Hydra to incorporate configuration parameters from `env/pusht.yaml` and `policy/diffusion.yaml`. _Note: Be aware of the order as any configuration parameters with the same name will be overidden. Thus, `default.yaml` is overriden by `env/pusht.yaml`  which is overidden by `policy/diffusion.yaml`_.
+
+Then, `default.yaml` also contains common configuration parameters such as `device: cuda` or `use_amp: false` (for enabling fp16 training). Some other parameters are set to `???` which indicates that they are expected to be set in additional yaml files. For instance, `training.offline_steps: ???` in `default.yaml` is set to `200000` in `diffusion.yaml`.
+
+Thanks to this `defaults` section in `default.yaml`, if you want to train Diffusion Policy with PushT, you really only need to run:
+
+```bash
+python lerobot/scripts/train.py
+```
+
+However, you can be more explicit and launch the exact same Diffusion Policy training on PushT with:
+
+```bash
+python lerobot/scripts/train.py policy=diffusion env=pusht
+```
+
+This way of overriding defaults via the CLI is especially useful when you want to change the policy and/or environment. For instance, you can train ACT on the default Aloha environment with:
+
+```bash
+python lerobot/scripts/train.py policy=act env=aloha
+```
+
+There are two things to note here:
+- Config overrides are passed as `param_name=param_value`.
+- Here we have overridden the defaults section. `policy=act` tells Hydra to use `policy/act.yaml`, and `env=aloha` tells Hydra to use `env/pusht.yaml`.
+
+_As an aside: we've set up all of our configurations so that they reproduce state-of-the-art results from papers in the literature._
+
+## Overriding configuration parameters in the CLI
+
+Now let's say that we want to train on a different task in the Aloha environment. If you look in `env/aloha.yaml` you will see something like:
+
+```yaml
+# lerobot/configs/env/aloha.yaml
+env:
+  task: AlohaInsertion-v0
+```
+
+And if you look in `policy/act.yaml` you will see something like:
+
+```yaml
+# lerobot/configs/policy/act.yaml
+dataset_repo_id: lerobot/aloha_sim_insertion_human
+```
+
+But our Aloha environment actually supports a cube transfer task as well. To train for this task, you could manually modify the two yaml configuration files respectively.
+
+First, we'd need to switch to using the cube transfer task for the ALOHA environment.
+
+```diff
+# lerobot/configs/env/aloha.yaml
+env:
+-  task: AlohaInsertion-v0
++  task: AlohaTransferCube-v0
+```
+
+Then, we'd also need to switch to using the cube transfer dataset.
+
+```diff
+# lerobot/configs/policy/act.yaml
+-dataset_repo_id: lerobot/aloha_sim_insertion_human
++dataset_repo_id: lerobot/aloha_sim_transfer_cube_human
+```
+
+Then, you'd be able to run:
+
+```bash
+python lerobot/scripts/train.py policy=act env=aloha
+```
+
+and you'd be training and evaluating on the cube transfer task.
+
+An alternative approach to editing the yaml configuration files, would be to override the defaults via the command line:
+
+```bash
+python lerobot/scripts/train.py \
+    policy=act \
+    dataset_repo_id=lerobot/aloha_sim_transfer_cube_human \
+    env=aloha \
+    env.task=AlohaTransferCube-v0
+```
+
+There's something new here. Notice the `.` delimiter used to traverse the configuration hierarchy. _But be aware that the `defaults` section is an exception. As you saw above, we didn't need to write `defaults.policy=act` in the CLI. `policy=act` was enough._
+
+Putting all that knowledge together, here's the command that was used to train https://huggingface.co/lerobot/act_aloha_sim_transfer_cube_human.
+
+```bash
+python lerobot/scripts/train.py \
+    hydra.run.dir=outputs/train/act_aloha_sim_transfer_cube_human \
+    device=cuda
+    env=aloha \
+    env.task=AlohaTransferCube-v0 \
+    dataset_repo_id=lerobot/aloha_sim_transfer_cube_human \
+    policy=act \
+    training.eval_freq=10000 \
+    training.log_freq=250 \
+    training.offline_steps=100000 \
+    training.save_model=true \
+    training.save_freq=25000 \
+    eval.n_episodes=50 \
+    eval.batch_size=50 \
+    wandb.enable=false \
+```
+
+There's one new thing here: `hydra.run.dir=outputs/train/act_aloha_sim_transfer_cube_human`, which specifies where to save the training output.
+
+---
+
+So far we've seen how to train Diffusion Policy for PushT and ACT for ALOHA. Now, what if we want to train ACT for PushT? Well, there are aspects of the ACT configuration that are specific to the ALOHA environments, and these happen to be incompatible with PushT. Therefore, trying to run the following will almost certainly raise an exception of sorts (eg: feature dimension mismatch):
+
+```bash
+python lerobot/scripts/train.py policy=act env=pusht dataset_repo_id=lerobot/pusht
+```
+
+Please, head on over to our [advanced tutorial on adapting policy configuration to various environments](./advanced/train_act_pusht/train_act_pusht.md) to learn more.
+
+Or in the meantime, happy coding! 🤗
diff --git a/examples/advanced/1_train_act_pusht/act_pusht.yaml b/examples/advanced/1_train_act_pusht/act_pusht.yaml
new file mode 100644
index 00000000..38e542fb
--- /dev/null
+++ b/examples/advanced/1_train_act_pusht/act_pusht.yaml
@@ -0,0 +1,87 @@
+# @package _global_
+
+# Change the seed to match what PushT eval uses
+# (to avoid evaluating on seeds used for generating the training data).
+seed: 100000
+# Change the dataset repository to the PushT one.
+dataset_repo_id: lerobot/pusht
+
+override_dataset_stats:
+  observation.image:
+    # stats from imagenet, since we use a pretrained vision model
+    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
+    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
+
+training:
+  offline_steps: 80000
+  online_steps: 0
+  eval_freq: 10000
+  save_freq: 100000
+  log_freq: 250
+  save_model: true
+
+  batch_size: 8
+  lr: 1e-5
+  lr_backbone: 1e-5
+  weight_decay: 1e-4
+  grad_clip_norm: 10
+  online_steps_between_rollouts: 1
+
+  delta_timestamps:
+    action: "[i / ${fps} for i in range(${policy.chunk_size})]"
+
+eval:
+  n_episodes: 50
+  batch_size: 50
+
+# See `configuration_act.py` for more details.
+policy:
+  name: act
+
+  # Input / output structure.
+  n_obs_steps: 1
+  chunk_size: 100 # chunk_size
+  n_action_steps: 100
+
+  input_shapes:
+    observation.image: [3, 96, 96]
+    observation.state: ["${env.state_dim}"]
+  output_shapes:
+    action: ["${env.action_dim}"]
+
+  # Normalization / Unnormalization
+  input_normalization_modes:
+    observation.image: mean_std
+    # Use min_max normalization just because it's more standard.
+    observation.state: min_max
+  output_normalization_modes:
+    # Use min_max normalization just because it's more standard.
+    action: min_max
+
+  # Architecture.
+  # Vision backbone.
+  vision_backbone: resnet18
+  pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
+  replace_final_stride_with_dilation: false
+  # Transformer layers.
+  pre_norm: false
+  dim_model: 512
+  n_heads: 8
+  dim_feedforward: 3200
+  feedforward_activation: relu
+  n_encoder_layers: 4
+    # Note: Although the original ACT implementation has 7 for `n_decoder_layers`, there is a bug in the code
+  # that means only the first layer is used. Here we match the original implementation by setting this to 1.
+  # See this issue https://github.com/tonyzhaozh/act/issues/25#issue-2258740521.
+  n_decoder_layers: 1
+  # VAE.
+  use_vae: true
+  latent_dim: 32
+  n_vae_encoder_layers: 4
+
+  # Inference.
+  temporal_ensemble_momentum: null
+
+  # Training and loss computation.
+  dropout: 0.1
+  kl_weight: 10.0
diff --git a/examples/advanced/1_train_act_pusht/train_act_pusht.md b/examples/advanced/1_train_act_pusht/train_act_pusht.md
new file mode 100644
index 00000000..0258c991
--- /dev/null
+++ b/examples/advanced/1_train_act_pusht/train_act_pusht.md
@@ -0,0 +1,70 @@
+In this tutorial we will learn how to adapt a policy configuration to be compatible with a new environment and dataset. As a concrete example, we will adapt the default configuration for ACT to be compatible with the PushT environment and dataset.
+
+If you haven't already read our tutorial on the [training script and configuration tooling](../4_train_policy_with_script.md) please do so prior to tackling this tutorial.
+
+Let's get started!
+
+Suppose we want to train ACT for PushT. Well, there are aspects of the ACT configuration that are specific to the ALOHA environments, and these happen to be incompatible with PushT. Therefore, trying to run the following will almost certainly raise an exception of sorts (eg: feature dimension mismatch):
+
+```bash
+python lerobot/scripts/train.py policy=act env=pusht dataset_repo_id=lerobot/pusht
+```
+
+We need to adapt the parameters of the ACT policy configuration to the PushT environment. The most important ones are the image keys.
+
+ALOHA's datasets and environments typically use a variable number of cameras. In `lerobot/configs/policy/act.yaml` you may notice two relevant sections. Here we show you the minimal diff needed to adjust to PushT:
+
+```diff
+override_dataset_stats:
+-  observation.images.top:
++  observation.image:
+    # stats from imagenet, since we use a pretrained vision model
+    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
+    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
+
+policy:
+  input_shapes:
+-    observation.images.top: [3, 480, 640]
++    observation.image: [3, 96, 96]
+    observation.state: ["${env.state_dim}"]
+  output_shapes:
+    action: ["${env.action_dim}"]
+
+  input_normalization_modes:
+-    observation.images.top: mean_std
++    observation.image: mean_std
+     observation.state: min_max
+  output_normalization_modes:
+    action: min_max
+```
+
+Here we've accounted for the following:
+- PushT uses "observation.image" for its image key.
+- PushT provides smaller images.
+
+_Side note: technically we could override these via the CLI, but with many changes it gets a bit messy, and we also have a bit of a challenge in that we're using `.` in our observation keys which is treated by Hydra as a hierarchical separator_.
+
+For your convenience, we provide [`act_pusht.yaml`](./act_pusht.yaml) in this directory. It contains the diff above, plus some other (optional) ones that are explained within. Please copy it into `lerobot/configs/policy` with:
+
+```bash
+cp examples/advanced/1_train_act_pusht/act_pusht.yaml lerobot/configs/policy/act_pusht.yaml
+```
+
+(remember from a [previous tutorial](../4_train_policy_with_script.md) that Hydra will look in the `lerobot/configs` directory). Now try running the following.
+
+<!-- Note to contributor: are you changing this command? Note that it's tested in `Makefile`, so change it there too! -->
+```bash
+python lerobot/scripts/train.py policy=act_pusht env=pusht
+```
+
+Notice that this is much the same as the command that failed at the start of the tutorial, only:
+- Now we are using `policy=act_pusht` to point to our new configuration file.
+- We can drop `dataset_repo_id=lerobot/pusht` as the change is incorporated in our new configuration file.
+
+Hurrah! You're now training ACT for the PushT environment.
+
+---
+
+The bottom line of this tutorial is that when training policies for different environments and datasets you will need to understand what parts of the policy configuration are specific to those and make changes accordingly.
+
+Happy coding! 🤗
diff --git a/examples/4_calculate_validation_loss.py b/examples/advanced/2_calculate_validation_loss.py
similarity index 100%
rename from examples/4_calculate_validation_loss.py
rename to examples/advanced/2_calculate_validation_loss.py
diff --git a/tests/test_examples.py b/tests/test_examples.py
index a0c60b7e..0a6ce422 100644
--- a/tests/test_examples.py
+++ b/tests/test_examples.py
@@ -45,11 +45,11 @@ def test_example_1():
 
 
 @require_package("gym_pusht")
-def test_examples_2_through_4():
+def test_examples_basic2_basic3_advanced1():
     """
     Train a model with example 3, check the outputs.
     Evaluate the trained model with example 2, check the outputs.
-    Calculate the validation loss with example 4, check the outputs.
+    Calculate the validation loss with advanced example 1, check the outputs.
     """
 
     ### Test example 3
@@ -97,7 +97,7 @@ def test_examples_2_through_4():
     assert Path("outputs/eval/example_pusht_diffusion/rollout.mp4").exists()
 
     ## Test example 4
-    file_contents = _read_file("examples/4_calculate_validation_loss.py")
+    file_contents = _read_file("examples/advanced/2_calculate_validation_loss.py")
 
     # Run on a single example from the last episode, use CPU, and use the local model.
     file_contents = _find_and_replace(