Commit Graph

924 Commits

Author SHA1 Message Date
AdilZouitine ee4ebeac9b match target entropy hil serl
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine fe7b47f459 stick to hil serl nn architecture
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine 044ca3b039 Refactor modeling_sac and parameter handling for clarity and reusability.
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
2025-04-16 16:46:37 +02:00
AdilZouitine bc36c69b71 fix encoder training 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot] 2b9b05f1ba [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
Michel Aractingi 9eec7b8bb0 General fixes in code, removed delta action, fixed grasp penalty, added logic to put gripper reward in info 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot] a80a9cf379 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
AdilZouitine 7a42af835e fix caching and dataset stats is optional 2025-04-16 16:46:37 +02:00
AdilZouitine 9751328783 Add rounding for safety 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot] 7225bc74a3 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
AdilZouitine 03b1644bf7 fix sign issue 2025-04-16 16:46:37 +02:00
AdilZouitine 9b6e5a383f Refactor complementary_info handling in ReplayBuffer 2025-04-16 16:46:37 +02:00
AdilZouitine 86466b025f Handle gripper penalty 2025-04-16 16:46:37 +02:00
AdilZouitine 54745f111d fix caching 2025-04-16 16:46:37 +02:00
pre-commit-ci[bot] 82584cca78 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
AdilZouitine d3a8c2c247 fix indentation issue 2025-04-16 16:46:37 +02:00
AdilZouitine 74c11c4a75 Enhance SAC configuration and replay buffer with asynchronous prefetching support
- Added async_prefetch parameter to SACConfig for improved buffer management.
- Implemented get_iterator method in ReplayBuffer to support asynchronous prefetching of batches.
- Updated learner_server to utilize the new iterator for online and offline sampling, enhancing training efficiency.
2025-04-16 16:46:37 +02:00
AdilZouitine 2d932b710c Enhance SACPolicy to support shared encoder and optimize action selection
- Cached encoder output in select_action method to reduce redundant computations.
- Updated action selection and grasp critic calls to utilize cached encoder features when available.
2025-04-16 16:46:37 +02:00
AdilZouitine a54baceabb Enhance SACPolicy and learner server for improved grasp critic integration
- Updated SACPolicy to conditionally compute grasp critic losses based on the presence of discrete actions.
- Refactored the forward method to handle grasp critic model selection and loss computation more clearly.
- Adjusted learner server to utilize optimized parameters for grasp critic during training.
- Improved action handling in the ManiskillMockGripperWrapper to accommodate both tuple and single action inputs.
2025-04-16 16:46:37 +02:00
AdilZouitine 077d18b439 Refactor SACPolicy for improved readability and action dimension handling
- Cleaned up code formatting for better readability, including consistent spacing and removal of unnecessary blank lines.
- Consolidated continuous action dimension calculation to enhance clarity and maintainability.
- Simplified loss return statements in the forward method to improve code structure.
- Ensured grasp critic parameters are included conditionally based on configuration settings.
2025-04-16 16:46:37 +02:00
AdilZouitine c6cd1475a7 Add mock gripper support and enhance SAC policy action handling
- Introduced mock_gripper parameter in ManiskillEnvConfig to enable gripper simulation.
- Added ManiskillMockGripperWrapper to adjust action space for environments with discrete actions.
- Updated SACPolicy to compute continuous action dimensions correctly, ensuring compatibility with the new gripper setup.
- Refactored action handling in the training loop to accommodate the changes in action dimensions.
2025-04-16 16:46:37 +02:00
AdilZouitine e35ee47b07 Refactor SAC policy and training loop to enhance discrete action support
- Updated SACPolicy to conditionally compute losses for grasp critic based on num_discrete_actions.
- Simplified forward method to return loss outputs as a dictionary for better clarity.
- Adjusted learner_server to handle both main and grasp critic losses during training.
- Ensured optimizers are created conditionally for grasp critic based on configuration settings.
2025-04-16 16:46:37 +02:00
AdilZouitine c3f2487026 Refactor SAC configuration and policy to support discrete actions
- Removed GraspCriticNetworkConfig class and integrated its parameters into SACConfig.
- Added num_discrete_actions parameter to SACConfig for better action handling.
- Updated SACPolicy to conditionally create grasp critic networks based on num_discrete_actions.
- Enhanced grasp critic forward pass to handle discrete actions and compute losses accordingly.
2025-04-16 16:46:37 +02:00
Michel Aractingi c621077b62 Added Gripper quantization wrapper and grasp penalty
removed complementary info from buffer and learner server
removed get_gripper_action function
added gripper parameters to `common/envs/configs.py`
2025-04-16 16:46:37 +02:00
pre-commit-ci[bot] f5cfd9fd48 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-04-16 16:46:37 +02:00
s1lent4gnt 22da1739b1 Add grasp critic to the training loop
- Integrated the grasp critic gradient update to the training loop in learner_server
- Added Adam optimizer and configured grasp critic learning rate in configuration_sac
- Added target critics networks update after the critics gradient step
2025-04-16 16:46:37 +02:00
s1lent4gnt d38d5f988d Add get_gripper_action method to GamepadController 2025-04-16 16:46:37 +02:00
s1lent4gnt 8d1936ffe0 Add gripper penalty wrapper 2025-04-16 16:46:37 +02:00
s1lent4gnt cef944e1b1 Add complementary info in the replay buffer
- Added complementary info in the add method
- Added complementary info in the sample method
2025-04-16 16:46:37 +02:00
s1lent4gnt 384eb2cd07 Add grasp critic
- Implemented grasp critic to evaluate gripper actions
- Added corresponding config parameters for tuning
2025-04-16 16:46:37 +02:00
pre-commit-ci[bot] 0f706ce543 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-31 13:59:32 +00:00
AdilZouitine 026ad463a9 Fix convergence of sac, multiple torch compile on the same model caused divergence 2025-03-31 13:54:21 +00:00
AdilZouitine 8494634d48 Fix cuda graph break 2025-03-31 07:59:56 +00:00
s1lent4gnt 66c3672738
Fix: Prevent Invalid next_state References When optimize_memory=True (#918) 2025-03-31 09:43:40 +02:00
pre-commit-ci[bot] c05e4835d0 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-03-28 17:20:39 +00:00
Michel Aractingi 808cf63221 Added support for controlling the gripper with the pygame interface of gamepad
Minor modifications in gym_manipulator to quantize the gripper actions
clamped the observations after F.resize in ConvertToLeRobotObservation wrapper due to a bug in F.resize, images were returned exceeding the maximum value of 1.0
2025-03-28 17:18:48 +00:00
AdilZouitine 0150139668 Refactor SACPolicy for improved type annotations and readability
- Enhanced type annotations for variables in the `SACPolicy` class to improve code clarity.
- Updated method calls to use keyword arguments for better readability.
- Streamlined the extraction of batch components, ensuring consistent typing across the class methods.
2025-03-28 17:18:48 +00:00
AdilZouitine b3ad63cf6e Refactor SACPolicy and learner_server for improved clarity and functionality
- Updated the `forward` method in `SACPolicy` to handle loss computation for actor, critic, and temperature models.
- Replaced direct calls to `compute_loss_*` methods with a unified `forward` method in `learner_server`.
- Enhanced batch processing by consolidating input parameters into a single dictionary for better readability and maintainability.
- Removed redundant code and improved documentation for clarity.
2025-03-28 17:18:48 +00:00
AdilZouitine 8b02e81bb5 Refactor actor_server.py for improved structure and logging
- Consolidated logging initialization and enhanced logging for actor processes.
- Streamlined the handling of gRPC connections and process management.
- Improved readability by organizing core algorithm functions and communication functions.
- Added detailed comments and documentation for clarity.
- Ensured proper queue management and shutdown handling for actor processes.
2025-03-28 17:18:48 +00:00
AdilZouitine dcce446a66 Refactor learner_server.py for improved structure and clarity
- Removed unused imports and streamlined the code structure.
- Consolidated logging initialization and enhanced logging for training processes.
- Improved handling of training state loading and resume logic.
- Refactored transition and interaction message processing for better readability and maintainability.
- Added detailed comments and documentation for clarity.
2025-03-28 17:18:48 +00:00
AdilZouitine 82a6b69e0e Refactor imports in modeling_sac.py for improved organization
- Rearranged import statements for better readability.
- Removed unused imports and streamlined the code structure.
2025-03-28 17:18:48 +00:00
AdilZouitine 6f7024242a Refactor SACConfig properties for improved readability
- Simplified the `image_features` property to directly iterate over `input_features`.
- Removed unused imports and unnecessary code related to main execution, enhancing clarity and maintainability.
2025-03-28 17:18:48 +00:00
AdilZouitine 3c56ad33c3 fix 2025-03-28 17:18:48 +00:00
AdilZouitine 49baa1ff49 Enhance logging for actor and learner servers
- Implemented process-specific logging for actor and learner servers to improve traceability.
- Created a dedicated logs directory and ensured it exists before logging.
- Initialized logging with explicit log files for each process, including actor transitions, interactions, and policy.
- Updated the actor CLI to validate configuration and set up logging accordingly.
2025-03-28 17:18:48 +00:00
Michel Aractingi 02b9ea9446 Added gripper control mechanism to gym_manipulator
Moved HilSerl env config to configs/env/configs.py
fixes in actor_server and modeling_sac and configuration_sac
added the possibility of ignoring missing keys in env_cfg in get_features_from_env_config function
2025-03-28 17:18:48 +00:00
AdilZouitine 79e0f6e06c Add WrapperConfig for environment wrappers and update SACConfig properties
- Introduced `WrapperConfig` dataclass for environment wrapper configurations.
- Updated `ManiskillEnvConfig` to include a `wrapper` field for enhanced environment management.
- Modified `SACConfig` to return `None` for `observation_delta_indices` and `action_delta_indices` properties.
- Refactored `make_robot_env` function to improve readability and maintainability.
2025-03-28 17:18:48 +00:00
Michel Aractingi d0b7690bc0 Change HILSerlRobotEnvConfig to inherit from EnvConfig
Added support for hil_serl classifier to be trained with train.py
run classifier training by python lerobot/scripts/train.py --policy.type=hilserl_classifier
fixes in find_joint_limits, control_robot, end_effector_control_utils
2025-03-28 17:18:48 +00:00
AdilZouitine 052a4acfc2 [WIP] Update SAC configuration and environment settings
- Reduced frame rate in `ManiskillEnvConfig` from 400 to 200.
- Enhanced `SACConfig` with new dataclasses for actor, learner, and network configurations.
- Improved input and output feature management in `SACConfig`.
- Refactored `actor_server` and `learner_server` to access configuration properties directly.
- Updated training pipeline to validate configurations and handle dataset repo IDs more robustly.
2025-03-28 17:18:48 +00:00
AdilZouitine 626e5dd35c Add wandb run id in config 2025-03-28 17:18:48 +00:00
AdilZouitine dd37bd412e [WIP] Non functional yet
Add ManiSkill environment configuration and wrappers

- Introduced `VideoRecordConfig` for video recording settings.
- Added `ManiskillEnvConfig` to encapsulate environment-specific configurations.
- Implemented various wrappers for the ManiSkill environment, including observation and action scaling.
- Enhanced the `make_maniskill` function to create a wrapped ManiSkill environment with video recording and observation processing.
- Updated the `actor_server` and `learner_server` to utilize the new configuration structure.
- Refactored the training pipeline to accommodate the new environment and policy configurations.
2025-03-28 17:18:48 +00:00