Commit Graph

906 Commits

Author SHA1 Message Date
AdilZouitine 0185a0b6fd Fix cuda graph break 2025-04-18 15:10:22 +02:00
s1lent4gnt 70d418935d Fix: Prevent Invalid next_state References When optimize_memory=True (#918) 2025-04-18 15:10:22 +02:00
pre-commit-ci[bot] eb44a06a9b [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:10:22 +02:00
Michel Aractingi 8eb3c1510c Added support for controlling the gripper with the pygame interface of gamepad
Minor modifications in gym_manipulator to quantize the gripper actions
clamped the observations after F.resize in ConvertToLeRobotObservation wrapper due to a bug in F.resize, images were returned exceeding the maximum value of 1.0
2025-04-18 15:10:22 +02:00
AdilZouitine 4d5ecb082e Refactor SACPolicy for improved type annotations and readability
- Enhanced type annotations for variables in the `SACPolicy` class to improve code clarity.
- Updated method calls to use keyword arguments for better readability.
- Streamlined the extraction of batch components, ensuring consistent typing across the class methods.
2025-04-18 15:10:22 +02:00
AdilZouitine 6e687e2910 Refactor SACPolicy and learner_server for improved clarity and functionality
- Updated the `forward` method in `SACPolicy` to handle loss computation for actor, critic, and temperature models.
- Replaced direct calls to `compute_loss_*` methods with a unified `forward` method in `learner_server`.
- Enhanced batch processing by consolidating input parameters into a single dictionary for better readability and maintainability.
- Removed redundant code and improved documentation for clarity.
2025-04-18 15:10:22 +02:00
AdilZouitine eb710647bf Refactor actor_server.py for improved structure and logging
- Consolidated logging initialization and enhanced logging for actor processes.
- Streamlined the handling of gRPC connections and process management.
- Improved readability by organizing core algorithm functions and communication functions.
- Added detailed comments and documentation for clarity.
- Ensured proper queue management and shutdown handling for actor processes.
2025-04-18 15:10:22 +02:00
AdilZouitine 176557d770 Refactor learner_server.py for improved structure and clarity
- Removed unused imports and streamlined the code structure.
- Consolidated logging initialization and enhanced logging for training processes.
- Improved handling of training state loading and resume logic.
- Refactored transition and interaction message processing for better readability and maintainability.
- Added detailed comments and documentation for clarity.
2025-04-18 15:10:22 +02:00
AdilZouitine 3beab33fac Refactor imports in modeling_sac.py for improved organization
- Rearranged import statements for better readability.
- Removed unused imports and streamlined the code structure.
2025-04-18 15:10:22 +02:00
AdilZouitine c0ba4b4954 Refactor SACConfig properties for improved readability
- Simplified the `image_features` property to directly iterate over `input_features`.
- Removed unused imports and unnecessary code related to main execution, enhancing clarity and maintainability.
2025-04-18 15:10:22 +02:00
AdilZouitine 8fb373aeb2 fix 2025-04-18 15:10:22 +02:00
AdilZouitine 5a0ee06651 Enhance logging for actor and learner servers
- Implemented process-specific logging for actor and learner servers to improve traceability.
- Created a dedicated logs directory and ensured it exists before logging.
- Initialized logging with explicit log files for each process, including actor transitions, interactions, and policy.
- Updated the actor CLI to validate configuration and set up logging accordingly.
2025-04-18 15:10:22 +02:00
Michel Aractingi 05a237ce10 Added gripper control mechanism to gym_manipulator
Moved HilSerl env config to configs/env/configs.py
fixes in actor_server and modeling_sac and configuration_sac
added the possibility of ignoring missing keys in env_cfg in get_features_from_env_config function
2025-04-18 15:10:22 +02:00
AdilZouitine 88cc2b8fc8 Add WrapperConfig for environment wrappers and update SACConfig properties
- Introduced `WrapperConfig` dataclass for environment wrapper configurations.
- Updated `ManiskillEnvConfig` to include a `wrapper` field for enhanced environment management.
- Modified `SACConfig` to return `None` for `observation_delta_indices` and `action_delta_indices` properties.
- Refactored `make_robot_env` function to improve readability and maintainability.
2025-04-18 15:10:22 +02:00
Michel Aractingi b69132c79d Change HILSerlRobotEnvConfig to inherit from EnvConfig
Added support for hil_serl classifier to be trained with train.py
run classifier training by python lerobot/scripts/train.py --policy.type=hilserl_classifier
fixes in find_joint_limits, control_robot, end_effector_control_utils
2025-04-18 15:10:21 +02:00
AdilZouitine db897a1619 [WIP] Update SAC configuration and environment settings
- Reduced frame rate in `ManiskillEnvConfig` from 400 to 200.
- Enhanced `SACConfig` with new dataclasses for actor, learner, and network configurations.
- Improved input and output feature management in `SACConfig`.
- Refactored `actor_server` and `learner_server` to access configuration properties directly.
- Updated training pipeline to validate configurations and handle dataset repo IDs more robustly.
2025-04-18 15:09:46 +02:00
AdilZouitine 0b5b62c8fb Add wandb run id in config 2025-04-18 15:09:46 +02:00
AdilZouitine 056f79d358 [WIP] Non functional yet
Add ManiSkill environment configuration and wrappers

- Introduced `VideoRecordConfig` for video recording settings.
- Added `ManiskillEnvConfig` to encapsulate environment-specific configurations.
- Implemented various wrappers for the ManiSkill environment, including observation and action scaling.
- Enhanced the `make_maniskill` function to create a wrapped ManiSkill environment with video recording and observation processing.
- Updated the `actor_server` and `learner_server` to utilize the new configuration structure.
- Refactored the training pipeline to accommodate the new environment and policy configurations.
2025-04-18 15:09:46 +02:00
Michel Aractingi 114ec644d0 Change config logic in:
- gym_manipulator
- find_joint_limits
- end_effector_utils
2025-04-18 15:09:45 +02:00
AdilZouitine 26ee8b6ae5 Add .devcontainer to .gitignore for improved development environment management 2025-04-18 15:09:27 +02:00
AdilZouitine 38e8864284 Add task field to frame_dict in ReplayBuffer and simplify save_episode calls
- Introduced a new "task" field in frame_dict to meet the requirements of LeRobotDataset.
- Removed task_name parameter from save_episode calls for consistency.
2025-04-18 15:09:27 +02:00
AdilZouitine 80d566eb56 Handle new config with sac 2025-04-18 15:09:27 +02:00
AdilZouitine bb5a95889f Handle multi optimizers 2025-04-18 15:09:27 +02:00
pre-commit-ci[bot] 0ea27704f6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:09:25 +02:00
Michel Aractingi 2abbd60a0d Removed depleted files and scripts 2025-04-18 15:07:48 +02:00
pre-commit-ci[bot] 1c8daf11fd [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:07:46 +02:00
AdilZouitine cdcf346061 Update tensor device assignment in ReplayBuffer class
- Changed the device assignment for tensors in the ReplayBuffer class from `device` to `storage_device` for consistency and improved resource management.
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] 42f95e827d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine 618ed00d45 Initialize log_alpha with the logarithm of temperature_init in SACPolicy
- Updated the SACPolicy class to set log_alpha using the logarithm of the initial temperature value from the configuration.
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] 50d8db481e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine e4a5971ffd Remove unused functions and imports from modeling_sac.py
- Deleted the `find_and_copy_params` function and the `Ensemble` class, as they were deemed unnecessary.
- Cleaned up imports by removing `from_modules` from `tensordict` to enhance code clarity.
- Simplified the assertion in the `Policy` class for better readability.
2025-04-18 15:06:52 +02:00
AdilZouitine 36f9ccd851 Add intervention rate tracking in act_with_policy function
- Introduced counters for tracking intervention steps and total steps during training.
- Calculated and logged the intervention rate at the end of each episode.
- Reset intervention counters after each episode to ensure accurate tracking.
2025-04-18 15:06:52 +02:00
AdilZouitine 787aee0e60 - Updated the logging condition to use `log_freq` directly instead of accessing it through `cfg.training.log_freq` for improved readability and speed. 2025-04-18 15:06:52 +02:00
Eugene Mironov 0341a38fdd [PORT HIL-SERL] Optimize training loop, extract config usage (#855)
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
2025-04-18 15:06:52 +02:00
AdilZouitine ffbed4a141 Enhance training information logging in learner server
- Added tracking for replay buffer size and offline replay buffer size during training steps.
2025-04-18 15:06:52 +02:00
AdilZouitine 03fe0f054b Update configuration files for improved performance and flexibility
- Increased frame rate in `maniskill_example.yaml` from 20 to 400 for enhanced simulation speed.
- Updated `sac_maniskill.yaml` to set `dataset_repo_id` to null and adjusted `grad_clip_norm` from 10.0 to 40.0.
- Changed `storage_device` from "cpu" to "cuda" for better resource utilization.
- Modified `save_freq` from 2000000 to 1000000 to optimize saving intervals.
- Enhanced input normalization parameters for `observation.state` and `observation.image` in SAC policy.
- Adjusted `num_critics` from 10 to 2 and `policy_parameters_push_frequency` from 1 to 4 for improved training dynamics.
- Updated `learner_server.py` to utilize `offline_buffer_capacity` for replay buffer initialization.
- Changed action multiplier in `maniskill_manipulator.py` from 1 to 0.03 for finer control over actions.
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] fd74c194b6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine 0959694bab Refactor SACPolicy and learner server for improved replay buffer management
- Updated SACPolicy to create critic heads using a list comprehension for better readability.
- Simplified the saving and loading of models using `save_model` and `load_model` functions from the safetensors library.
- Introduced `initialize_offline_replay_buffer` function in the learner server to streamline offline dataset handling and replay buffer initialization.
- Enhanced logging for dataset loading processes to improve traceability during training.
2025-04-18 15:06:52 +02:00
Michel Aractingi 7b01e16439 Add end effector action space to hil-serl (#861)
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-18 15:06:52 +02:00
AdilZouitine 66816fd871 Enhance SAC configuration and policy with gradient clipping and temperature management
- Introduced `grad_clip_norm` parameter in SAC configuration for gradient clipping
- Updated SACPolicy to store temperature as an instance variable for consistent usage
- Modified loss calculations in SACPolicy to utilize the instance temperature
- Enhanced MLP and CriticHead to support a customizable final activation function
- Implemented gradient clipping in the learner server during training steps for both actor and critic
- Added tracking for gradient norms in training information
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] 599326508f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:52 +02:00
AdilZouitine 2f04d0d2b9 Add custom save and load methods for SAC policy
- Implement `_save_pretrained` method to handle TensorDict state saving
- Add `_from_pretrained` class method for loading SAC policy from files
- Create utility function `find_and_copy_params` to handle parameter copying
2025-04-18 15:06:52 +02:00
AdilZouitine e002c5ec56 Remove torch.no_grad decorator and optimize next action prediction in SAC policy
- Removed `@torch.no_grad` decorator from Unnormalize forward method

- Added TODO comment for optimizing next action prediction in SAC policy
- Minor formatting adjustment in NaN assertion for log standard deviation
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
2025-04-18 15:06:52 +02:00
s1lent4gnt 3dfb37e976 [Port HIL-SERL] Balanced sampler function speed up and refactor to align with train.py (#715)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-18 15:06:52 +02:00
Eugene Mironov b6a2200983 [HIL-SERL] Migrate threading to multiprocessing (#759)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-04-18 15:06:52 +02:00
pre-commit-ci[bot] 85fe8a3f4e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-18 15:06:51 +02:00
AdilZouitine bb69cb3c8c Add storage device configuration for SAC policy and replay buffer
- Introduce `storage_device` parameter in SAC configuration and training settings
- Update learner server to use configurable storage device for replay buffer
- Reduce online buffer capacity in ManiSkill configuration
- Modify replay buffer initialization to support custom storage device
2025-04-18 15:04:58 +02:00
AdilZouitine ae51c19b3c Add memory optimization option to ReplayBuffer
- Introduce `optimize_memory` parameter to reduce memory usage in replay buffer
- Implement simplified memory optimization by not storing duplicate next_states
- Update learner server and buffer initialization to use memory optimization by default
2025-04-18 15:04:58 +02:00
AdilZouitine 9ea79f8a76 Add storage device parameter to replay buffer initialization
- Specify storage device for replay buffer to optimize memory management
2025-04-18 15:04:58 +02:00
AdilZouitine 1d4ec50a58 Refactor ReplayBuffer with tensor-based storage and improved sampling efficiency
- Replaced list-based memory storage with pre-allocated tensor storage
- Optimized sampling process with direct tensor indexing
- Added support for DrQ image augmentation during sampling for offline dataset
- Improved dataset conversion with more robust episode handling
- Enhanced buffer initialization and state tracking
- Added comprehensive testing for buffer conversion and sampling
2025-04-18 15:04:58 +02:00