- Introduce `storage_device` parameter in SAC configuration and training settings
- Update learner server to use configurable storage device for replay buffer
- Reduce online buffer capacity in ManiSkill configuration
- Modify replay buffer initialization to support custom storage device
- Introduce `optimize_memory` parameter to reduce memory usage in replay buffer
- Implement simplified memory optimization by not storing duplicate next_states
- Update learner server and buffer initialization to use memory optimization by default
- Replaced list-based memory storage with pre-allocated tensor storage
- Optimized sampling process with direct tensor indexing
- Added support for DrQ image augmentation during sampling for offline dataset
- Improved dataset conversion with more robust episode handling
- Enhanced buffer initialization and state tracking
- Added comprehensive testing for buffer conversion and sampling
- Reduced image size in ManiSkill environment configuration from 128 to 64
- Added support for truncation in replay buffer and actor server
- Updated SAC policy configuration to use a specific dataset and modify vision encoder settings
- Improved dataset conversion process with progress tracking and task naming
- Added flexibility for joint action space masking in learner server
- Introduced Ensemble and CriticHead classes for more efficient critic network handling
- Added support for multiple camera inputs in observation encoder
- Optimized image encoding by batching image processing
- Updated configuration for ManiSkill environment with reduced image size and action scaling
- Compiled critic networks for improved performance
- Simplified normalization and ensemble handling in critic networks
Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>
- Fixed big issue in the normalization of the actions in the `forward` function of the critic -- remove the `torch.no_grad` decorator in `normalize.py` in the normalization function
- Fixed performance issue to boost the optimization frequency by setting the storage device to be the same as the device of learning.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
- Modify logger to support multiple custom step keys
- Update logging method to handle custom step keys more flexibly
- Enhance logging of optimization step and frequency
Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>
Co-authored-by: Khalil Meftah <kmeftah.khalil@gmail.com>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
Co-authored-by: Ke Wang <superwk1017@gmail.com>
- Added Nan detection mechanisms in the actor, learner and gym_manipulator for the case where we encounter nans in the loop.
- changed the non-blocking in the `.to(device)` functions to only work for the case of cuda because they were causing nans when running the policy on mps
- Added some joint clipping and limits in the env, robot and policy configs. TODO clean this part and make the limits in one config file only.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
- Added `lerobot/scripts/server/find_joint_limits.py` to test the min and max angles of the motion you wish the robot to explore during RL training.
- Added logic in `manipulator.py` to limit the maximum possible joint angles to allow motion within a predefined joint position range. The limits are specified in the yaml config for each robot. Checkout the so100.yaml.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
- Added several wrappers around the base gym env robot class.
- Including: time limit, reward classifier, crop images, preprocess observations.
- Added an interactive script crop_roi.py where the user can interactively select the roi in the observation images and return the correct crop values that will improve the policy and reward classifier performance.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
- added `torch.compile` to the actor and learner servers.
- organized imports in `train_sac.py`
- optimized the parameters push by not sending the frozen pre-trained encoder.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Added functions for converting the replay buffer from and to LeRobotDataset. When we want to save the replay buffer, we convert it first to LeRobotDataset format and save it locally and vice-versa.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
- Optimized critic design that improves the performance of the learner loop by a factor of 2
- Cleaned the code and fixed style issues
- Completed the config with actor_learner_config field that contains host-ip and port elemnts that are necessary for the actor-learner servers.
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>