ready for review

2024-05-17 17:10:07 +01:00 · 2024-05-17 17:10:07 +01:00 · 4e9b484d1c
parent 5ec0af62c6
commit 4e9b484d1c
6 changed files with 345 additions and 1 deletions
--- a/19
+++ b/19
@ -25,6 +25,7 @@ test-end-to-end:
 	${MAKE} test-tdmpc-ete-train
 	${MAKE} test-tdmpc-ete-eval
 	${MAKE} test-default-ete-eval
+	${MAKE} test-act-pusht-tutorial

 test-act-ete-train:
 	python lerobot/scripts/train.py \
@ -108,3 +109,21 @@ test-default-ete-eval:
 		eval.batch_size=1 \
 		env.episode_length=8 \
 		device=cpu \
+
+
+test-act-pusht-tutorial:
+	cp examples/advanced/train_act_pusht/act_pusht.yaml lerobot/configs/policy/created_by_Makefile.yaml
+	python lerobot/scripts/train.py \
+		policy=created_by_Makefile.yaml \
+		env=pusht \
+		wandb.enable=False \
+		training.offline_steps=2 \
+		eval.n_episodes=1 \
+		eval.batch_size=1 \
+		env.episode_length=2 \
+		device=cpu \
+		training.save_model=true \
+		training.save_freq=2 \
+		training.batch_size=2 \
+		hydra.run.dir=tests/outputs/act_pusht/
+	rm lerobot/configs/policy/created_by_Makefile.yaml
--- a/README.md
+++ b/README.md
@ -177,11 +177,13 @@ A link to the wandb logs for the run will also show up in yellow in your termina

 You can deactivate wandb by adding these arguments to the `train.py` python command:
 ```bash
+    # this first one is not necessary to disable wandb, but you can set it with wandb enabled to avoid
+    # uploading model checkpoints
    wandb.disable_artifact=true \
    wandb.enable=false
 ```

-Note: For efficiency, during training every checkpoint is evaluated on a low number of episodes. After training, you may want to re-evaluate your best checkpoints on more episodes or change the evaluation settings. See `python lerobot/scripts/eval.py --help` for more instructions.
+Note: For efficiency, during training every checkpoint is evaluated on a low number of episodes. You may use `eval.n_episodes=500` to evaluate on more episodes than the default. Or, after training, you may want to re-evaluate your best checkpoints on more episodes or change the evaluation settings. See `python lerobot/scripts/eval.py --help` for more instructions.


 ## Contribute
--- a/examples/4_train_policy_with_script.md
+++ b/examples/4_train_policy_with_script.md
@ -0,0 +1,157 @@
+This tutorial will explain the training script, how to use it, and particularly the use of Hydra to configure everything needed for the training run.
+
+## The training script
+
+LeRobot offers a training script at [`lerobot/scripts/train.py`](../../lerobot/scripts/train.py). At a high level it does the following:
+
+- Loads a Hydra configuration file for the following steps (more on Hydra in a moment).
+- Makes a simulation environment.
+- Makes a dataset corresponding to that simulation environment.
+- Makes a policy.
+- Runs a standard training loop with forward pass, backward pass, optimization step, and occasional logging, evaluation (of the policy on the environment), and checkpointing.
+
+## Our use of Hydra
+
+Explaining the ins and outs of [Hydra](https://hydra.cc/docs/intro/) is beyond the scope of this document, but here we'll share the main points you need to know.
+
+First, consider that `lerobot/configs` might have a directory structure like this (this is the case at the time of writing):
+
+```
+.
+├── default.yaml
+├── env
+│   ├── aloha.yaml
+│   ├── pusht.yaml
+│   └── xarm.yaml
+└── policy
+    ├── act.yaml
+    ├── diffusion.yaml
+    └── tdmpc.yaml
+```
+
+**_For brevity, in the rest of this document we'll drop the leading `lerobot/configs` path. So `default.yaml` really refers to `lerobot/configs/default.yaml`._**
+
+When you run the training script, Hydra takes over via the `@hydra.main` decorator. If you take a look at the `@hydra.main`'s arguments you will see `config_path="../configs", config_name="default"`. This means Hydra looks for `default.yaml` in `../configs` (which resolves to `lerobot/configs`).
+
+Among regular configuration hyperparameters like `device: cuda`, `default.yaml` has a `defaults` section. It might look like this.
+
+```yaml
+defaults:
+  - _self_
+  - env: pusht
+  - policy: diffusion
+```
+
+So, Hydra will grab `env/pusht.yaml` and `policy/diffusion.yaml` and incorporate their configuration parameters (any configuration parameters already present in `default.yaml` are overriden).
+
+## Running the training script with our provided configurations
+
+If you want to train Diffusion Policy with PushT, you really only need to run:
+
+```bash
+python lerobot/scripts/train.py
+```
+
+That's because `default.yaml` already defaults to using Diffusion Policy and PushT. To be more explicit, you could also do the following (which would have the same effect):
+
+```bash
+python lerobot/scripts/train.py policy=diffusion env=pusht
+```
+
+If you want to train ACT with Aloha, you can do:
+
+```bash
+python lerobot/scripts/train.py policy=act env=aloha
+```
+
+**Notice, how the config overrides are passed** as `param_name=param_value`. This is the format the Hydra excepts for parsing the overrides.
+
+## Overriding configuration parameters in the CLI
+
+If you look in `env/aloha.yaml` you might see:
+
+```yaml
+# lerobot/configs/env/aloha.yaml
+env:
+  task: AlohaInsertion-v0
+```
+
+And if you look in `policy/act.yaml` you might see:
+
+```yaml
+# lerobot/configs/policy/act.yaml
+dataset_repo_id: lerobot/aloha_sim_insertion_human
+```
+
+But our Aloha environment actually supports a cube transfer task as well. To train for this task, you _could_ modify the two configuration files respectively.
+
+We need to select the cube transfer task for the ALOHA environment.
+
+```yaml
+# lerobot/configs/env/aloha.yaml
+env:
+   task: AlohaTransferCube-v0
+```
+
+We also need to use the cube transfer dataset.
+
+```yaml
+# lerobot/configs/policy/act.yaml
+dataset_repo_id: lerobot/aloha_sim_transfer_cube_human
+```
+
+Now you'd be able to run:
+
+```bash
+python lerobot/scripts/train.py policy=act env=aloha
+```
+
+and you'd be training and evaluating on the cube transfer task.
+
+OR, your could leave the configuration files in their original state and override the defaults via the command line:
+
+```bash
+python lerobot/scripts/train.py \
+    policy=act \
+    dataset_repo_id=lerobot/aloha_sim_transfer_cube_human \
+    env=aloha \
+    env.task=AlohaTransferCube-v0
+```
+
+There's something new here. Notice the `.` delimiter used to traverse the configuration hierarchy.
+
+Putting all that knowledge together, here's the command that was used to train https://huggingface.co/lerobot/act_aloha_sim_transfer_cube_human.
+
+```bash
+python lerobot/scripts/train.py \
+    hydra.run.dir=outputs/train/act_aloha_sim_transfer_cube_human \
+    device=cuda
+    env=aloha \
+    env.task=AlohaTransferCube-v0 \
+    dataset_repo_id=lerobot/aloha_sim_transfer_cube_human \
+    policy=act \
+    training.eval_freq=10000 \
+    training.log_freq=250 \
+    training.offline_steps=100000 \
+    training.save_model=true \
+    training.save_freq=25000 \
+    eval.n_episodes=50 \
+    eval.batch_size=50 \
+    wandb.enable=false \
+```
+
+There's one new thing here: `hydra.run.dir=outputs/train/act_aloha_sim_transfer_cube_human`, which specifies where to save the training output.
+
+---
+
+Now, why don't you try running:
+
+```bash
+python lerobot/scripts/train.py policy=act env=pusht dataset_repo_id=lerobot/pusht
+```
+
+That was a little mean of us, because if you did try running that code, you almost certainly got an exception of sorts. That's because there are aspects of the ACT configuration that are specific to the ALOHA environments, and here we have tried to use PushT.
+
+Please, head on over to our advanced [tutorial on adapting policy configuration to various environments](./advanced/train_act_pusht/train_act_pusht.md).
+
+Or in the meantime, happy coding! 🤗
--- a/examples/advanced/train_act_pusht/act_pusht.yaml
+++ b/examples/advanced/train_act_pusht/act_pusht.yaml
@ -0,0 +1,87 @@
+# @package _global_
+
+# Change the seed to match what PushT eval uses
+# (to avoid evaluating on seeds used for generating the training data).
+seed: 100000
+# Change the dataset repository to the PushT one.
+dataset_repo_id: lerobot/pusht
+
+override_dataset_stats:
+  observation.image:
+    # stats from imagenet, since we use a pretrained vision model
+    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
+    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
+
+training:
+  offline_steps: 80000
+  online_steps: 0
+  eval_freq: 10000
+  save_freq: 100000
+  log_freq: 250
+  save_model: true
+
+  batch_size: 8
+  lr: 1e-5
+  lr_backbone: 1e-5
+  weight_decay: 1e-4
+  grad_clip_norm: 10
+  online_steps_between_rollouts: 1
+
+  delta_timestamps:
+    action: "[i / ${fps} for i in range(${policy.chunk_size})]"
+
+eval:
+  n_episodes: 50
+  batch_size: 50
+
+# See `configuration_act.py` for more details.
+policy:
+  name: act
+
+  # Input / output structure.
+  n_obs_steps: 1
+  chunk_size: 100 # chunk_size
+  n_action_steps: 100
+
+  input_shapes:
+    observation.image: [3, 96, 96]
+    observation.state: ["${env.state_dim}"]
+  output_shapes:
+    action: ["${env.action_dim}"]
+
+  # Normalization / Unnormalization
+  input_normalization_modes:
+    observation.image: mean_std
+    # Use min_max normalization just because it's more standard.
+    observation.state: min_max
+  output_normalization_modes:
+    # Use min_max normalization just because it's more standard.
+    action: min_max
+
+  # Architecture.
+  # Vision backbone.
+  vision_backbone: resnet18
+  pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
+  replace_final_stride_with_dilation: false
+  # Transformer layers.
+  pre_norm: false
+  dim_model: 512
+  n_heads: 8
+  dim_feedforward: 3200
+  feedforward_activation: relu
+  n_encoder_layers: 4
+    # Note: Although the original ACT implementation has 7 for `n_decoder_layers`, there is a bug in the code
+  # that means only the first layer is used. Here we match the original implementation by setting this to 1.
+  # See this issue https://github.com/tonyzhaozh/act/issues/25#issue-2258740521.
+  n_decoder_layers: 1
+  # VAE.
+  use_vae: true
+  latent_dim: 32
+  n_vae_encoder_layers: 4
+
+  # Inference.
+  temporal_ensemble_momentum: null
+
+  # Training and loss computation.
+  dropout: 0.1
+  kl_weight: 10.0
--- a/examples/advanced/train_act_pusht/train_act_pusht.md
+++ b/examples/advanced/train_act_pusht/train_act_pusht.md
@ -0,0 +1,62 @@
+In this tutorial we will adapt the default configuration for ACT to be compatible with the PushT environment and dataset.
+
+If you haven't already read our tutorial on the [training script and configuration tooling](../4_train_policy_with_script.md) please do so prior to tackling this tutorial.
+
+Let's get started! Now, why don't you try running:
+
+```bash
+python lerobot/scripts/train.py policy=act env=pusht dataset_repo_id=lerobot/pusht
+```
+
+That was a little mean of us, because if you did try running that command, you almost certainly got an exception of sorts. That's because there are aspects of the ACT configuration that are specific to the ALOHA environments, and here we have tried to use PushT.
+
+The most important ones are the image keys. ALOHA's datasets and environments typically use a variable number of cameras. In `lerobot/configs/policy/act.yaml` you may notice two relevant sections. Here we show you the minimal diff needed to adjust to PushT:
+
+```diff
+override_dataset_stats:
+-  observation.images.top:
+  observation.image:
+    # stats from imagenet, since we use a pretrained vision model
+    mean: [[[0.485]], [[0.456]], [[0.406]]]  # (c,1,1)
+    std: [[[0.229]], [[0.224]], [[0.225]]]  # (c,1,1)
+
+policy:
+  input_shapes:
+-    observation.images.top: [3, 480, 640]
+    observation.image: [3, 96, 96]
+    observation.state: ["${env.state_dim}"]
+  output_shapes:
+    action: ["${env.action_dim}"]
+
+  input_normalization_modes:
+-    observation.images.top: mean_std
+    observation.image: mean_std
+     observation.state: min_max
+  output_normalization_modes:
+    action: min_max
+```
+
+Here we've accounted for the following:
+- PushT uses "observation.image" for its image key.
+- PushT provides smaller images.
+
+_Side note: technically we could override these via the CLI, but with many changes it gets a bit messy, and we also have a bit of a challenge in that we're using `.` in our observation keys which is treated by Hydra as a hierarchical separator_.
+
+For your convenience, we provide [`act_pusht.yaml`](./act_pusht.yaml) in this directory. It contains the diff above, plus some other (optional) ones that are explained within. Please copy it into `lerobot/configs/policy` (remember from a [previous tutorial](../4_train_policy_with_script.md) that Hydra will look in the `lerobot/configs` directory). Now try running the following.
+
+<!-- Note to contributor: are you changing this command? Note that it's tested in `Makefile`, so change it there too! -->
+```bash
+python lerobot/scripts/train.py policy=act_pusht env=pusht
+```
+
+Notice that this is much the same as the command that failed at the start of the tutorial, only:
+- Now we are using `policy=act_pusht` to point to our new configuration file.
+- We can drop `dataset_repo_id=lerobot/pusht` as the change is incorporated in our new configuration file.
+
+Hurrah! You're now training ACT for the PushT environment.
+
+---
+
+The bottom line of this tutorial is that when training policies for different environments and datasets you will need to understand what parts of the policy configuration are specific to those and make changes accordingly.
+
+Happy coding! 🤗
--- a/examples/advanced/train_act_pusht/train_act_pusht.sh
+++ b/examples/advanced/train_act_pusht/train_act_pusht.sh
@ -0,0 +1,17 @@
+python lerobot/scripts/train.py \
+    hydra.job.name=act_pusht \
+    hydra.run.dir=outputs/train/act_pusht \
+    env=aloha \
+    env.task=AlohaInsertion-v0 \
+    dataset_repo_id=lerobot/pusht \
+    policy=act \
+    policy.use_vae=true \
+    training.eval_freq=10000 \
+    training.log_freq=250 \
+    training.offline_steps=100000 \
+    training.save_model=true \
+    training.save_freq=25000 \
+    eval.n_episodes=50 \
+    eval.batch_size=50 \
+    wandb.enable=false \
+    device=cuda \