lerobot/tests/test_policies.py

import pytest
import torch

from lerobot.common.datasets.utils import cycle
from lerobot.common.envs.utils import postprocess_action, preprocess_observation
from lerobot.common.policies.factory import make_policy
from lerobot.common.envs.factory import make_env
from lerobot.common.datasets.factory import make_dataset
from lerobot.common.utils import init_hydra_config
from .utils import DEVICE, DEFAULT_CONFIG_PATH

@pytest.mark.parametrize(
    "env_name,policy_name,extra_overrides",
    [
        ("xarm", "tdmpc", ["policy.mpc=true"]),
        ("pusht", "tdmpc", ["policy.mpc=false"]),
        ("pusht", "diffusion", []),
        ("aloha", "act", ["env.task=AlohaInsertion-v0", "dataset_id=aloha_sim_insertion_human"]),
        ("aloha", "act", ["env.task=AlohaInsertion-v0", "dataset_id=aloha_sim_insertion_scripted"]),
        ("aloha", "act", ["env.task=AlohaTransferCube-v0", "dataset_id=aloha_sim_transfer_cube_human"]),
        ("aloha", "act", ["env.task=AlohaTransferCube-v0", "dataset_id=aloha_sim_transfer_cube_scripted"]),
        # TODO(aliberts): xarm not working with diffusion
        # ("xarm", "diffusion", []),
    ],
)
def test_policy(env_name, policy_name, extra_overrides):
    """
    Tests:
        - Making the policy object.
        - Updating the policy.
        - Using the policy to select actions at inference time.
        - Test the action can be applied to the policy
    """
    cfg = init_hydra_config(
        DEFAULT_CONFIG_PATH,
        overrides=[
            f"env={env_name}",
            f"policy={policy_name}",
            f"device={DEVICE}",
        ]
        + extra_overrides
    )
    # Check that we can make the policy object.
    policy = make_policy(cfg)
    # Check that we run select_actions and get the appropriate output.
    dataset = make_dataset(cfg)
    env = make_env(cfg, num_parallel_envs=2)

    dataloader = torch.utils.data.DataLoader(
        dataset,
        num_workers=4,
        batch_size=2,
        shuffle=True,
        pin_memory=DEVICE != "cpu",
        drop_last=True,
    )
    dl_iter = cycle(dataloader)

    batch = next(dl_iter)

    for key in batch:
        batch[key] = batch[key].to(DEVICE, non_blocking=True)

    # Test updating the policy
    policy(batch, step=0)

    # reset the policy and environment
    policy.reset()
    observation, _ = env.reset(seed=cfg.seed)

    # apply transform to normalize the observations
    observation = preprocess_observation(observation, dataset.transform)

    # send observation to device/gpu
    observation = {key: observation[key].to(DEVICE, non_blocking=True) for key in observation}

    # get the next action for the environment
    with torch.inference_mode():
        action = policy.select_action(observation, step=0)

    # apply inverse transform to unnormalize the action
    action = postprocess_action(action, dataset.transform)

    # Test step through policy
    env.step(action)
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00			`import pytest`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00			`import torch`
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`from lerobot.common.datasets.utils import cycle`
			`from lerobot.common.envs.utils import postprocess_action, preprocess_observation`
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00			`from lerobot.common.policies.factory import make_policy`
backup wip 2024-03-20 00:02:09 +08:00			`from lerobot.common.envs.factory import make_env`
WIP WIP WIP train.py works, loss going down WIP eval.py Fix WIP (eval running, TODO: verify results reproduced) Eval works! (testing reproducibility) WIP pretrained model pusht reproduces same results as torchrl pretrained model pusht reproduces same results as torchrl Remove AbstractPolicy, Move all queues in select_action WIP test_datasets passed (TODO: re-enable NormalizeTransform) 2024-03-31 23:05:25 +08:00			`from lerobot.common.datasets.factory import make_dataset`
revision 2024-03-28 02:33:48 +08:00			`from lerobot.common.utils import init_hydra_config`
			`from .utils import DEVICE, DEFAULT_CONFIG_PATH`
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00
			`@pytest.mark.parametrize(`
backup wip 2024-03-20 02:50:04 +08:00			`"env_name,policy_name,extra_overrides",`
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00			`[`
Add gym-aloha, rename simxarm -> xarm, refactor 2024-04-08 22:18:53 +08:00			`("xarm", "tdmpc", ["policy.mpc=true"]),`
test_datasets.py are passing! 2024-04-08 22:02:03 +08:00			`("pusht", "tdmpc", ["policy.mpc=false"]),`
backup wip 2024-03-20 02:50:04 +08:00			`("pusht", "diffusion", []),`
tests are passing for aloha/act policies, removes abstract policy 2024-04-09 11:28:56 +08:00			`("aloha", "act", ["env.task=AlohaInsertion-v0", "dataset_id=aloha_sim_insertion_human"]),`
			`("aloha", "act", ["env.task=AlohaInsertion-v0", "dataset_id=aloha_sim_insertion_scripted"]),`
			`("aloha", "act", ["env.task=AlohaTransferCube-v0", "dataset_id=aloha_sim_transfer_cube_human"]),`
			`("aloha", "act", ["env.task=AlohaTransferCube-v0", "dataset_id=aloha_sim_transfer_cube_scripted"]),`
Add gym-aloha, rename simxarm -> xarm, refactor 2024-04-08 22:18:53 +08:00			`# TODO(aliberts): xarm not working with diffusion`
			`# ("xarm", "diffusion", []),`
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00			`],`
			`)`
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`def test_policy(env_name, policy_name, extra_overrides):`
backup wip 2024-03-20 02:50:04 +08:00			`"""`
			`Tests:`
			`- Making the policy object.`
			`- Updating the policy.`
			`- Using the policy to select actions at inference time.`
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`- Test the action can be applied to the policy`
backup wip 2024-03-20 02:50:04 +08:00			`"""`
revision 2024-03-28 02:33:48 +08:00			`cfg = init_hydra_config(`
			`DEFAULT_CONFIG_PATH,`
Refactor train, eval_policy, logger, Add diffusion.yaml (WIP) 2024-02-26 09:10:09 +08:00			`overrides=[`
			`f"env={env_name}",`
			`f"policy={policy_name}",`
Add DEVICE constant from LEROBOT_TESTS_DEVICE 2024-03-12 22:14:39 +08:00			`f"device={DEVICE}",`
Refactor train, eval_policy, logger, Add diffusion.yaml (WIP) 2024-02-26 09:10:09 +08:00			`]`
backup wip 2024-03-20 02:50:04 +08:00			`+ extra_overrides`
Refactor train, eval_policy, logger, Add diffusion.yaml (WIP) 2024-02-26 09:10:09 +08:00			`)`
backup wip 2024-03-20 00:02:09 +08:00			`# Check that we can make the policy object.`
Add policies/factory, Add test, Add _self_ in config 2024-02-25 18:50:23 +08:00			`policy = make_policy(cfg)`
backup wip 2024-03-20 02:50:04 +08:00			`# Check that we run select_actions and get the appropriate output.`
WIP WIP WIP train.py works, loss going down WIP eval.py Fix WIP (eval running, TODO: verify results reproduced) Eval works! (testing reproducibility) WIP pretrained model pusht reproduces same results as torchrl pretrained model pusht reproduces same results as torchrl Remove AbstractPolicy, Move all queues in select_action WIP test_datasets passed (TODO: re-enable NormalizeTransform) 2024-03-31 23:05:25 +08:00			`dataset = make_dataset(cfg)`
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`env = make_env(cfg, num_parallel_envs=2)`

			`dataloader = torch.utils.data.DataLoader(`
			`dataset,`
			`num_workers=4,`
tests for tdmpc and diffusion policy are passing 2024-04-09 10:50:32 +08:00			`batch_size=2,`
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`shuffle=True,`
			`pin_memory=DEVICE != "cpu",`
			`drop_last=True,`
backup wip 2024-03-20 02:50:04 +08:00			`)`
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`dl_iter = cycle(dataloader)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`batch = next(dl_iter)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`for key in batch:`
			`batch[key] = batch[key].to(DEVICE, non_blocking=True)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# Test updating the policy`
			`policy(batch, step=0)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# reset the policy and environment`
			`policy.reset()`
			`observation, _ = env.reset(seed=cfg.seed)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# apply transform to normalize the observations`
			`observation = preprocess_observation(observation, dataset.transform)`
fix environment seeding add fixes for reproducibility only try to start env if it is closed revision fix normalization and data type Improve README Improve README Tests are passing, Eval pretrained model works, Add gif Update gif Update gif Update gif Update gif Update README Update README update minor Update README.md Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Update README.md Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Address suggestions Update thumbnail + stats Update thumbnail + stats Update README.md Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Add more comments Add test_examples.py 2024-03-22 21:25:23 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# send observation to device/gpu`
			`observation = {key: observation[key].to(DEVICE, non_blocking=True) for key in observation}`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# get the next action for the environment`
			`with torch.inference_mode():`
			`action = policy.select_action(observation, step=0)`
revert dp changes, make act and tdmpc batch friendly 2024-03-19 03:18:21 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# apply inverse transform to unnormalize the action`
			`action = postprocess_action(action, dataset.transform)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00
Remove latency, tdmpc policy passes tests (TODO: make it work with online RL) 2024-04-08 00:01:22 +08:00			`# Test step through policy`
			`env.step(action)`
wip: still needs batch logic for act and tdmp 2024-03-14 23:22:55 +08:00