Commit Graph

173 Commits

Author SHA1 Message Date
KeWang1017 ecb91b37eb Refactor SACPolicy for improved action sampling and standard deviation handling
- Updated action selection to use distribution sampling and log probabilities for better stochastic behavior.
- Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs.
- Cleaned up code by removing unnecessary comments and improving readability.

These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.
2025-03-24 13:24:23 +01:00
KeWang1017 c89bcc5aa8 trying to get sac running 2025-03-24 13:24:23 +01:00
Michel Aractingi cc85bca2b5 Added normalization schemes and style checks 2025-03-24 13:24:23 +01:00
Michel Aractingi 3b07766c33 added optimizer and sac to factory.py 2025-03-24 13:23:53 +01:00
Eugene Mironov 287968b418 [HIL-SERL PORT] Fix linter issues (#588) 2025-03-24 13:23:02 +01:00
Eugene Mironov c9f1a037e3 [Port Hil-SERL] Add unit tests for the reward classifier & fix imports & check script (#578) 2025-03-24 13:23:02 +01:00
Michel Aractingi 8a7f74ee65 added comments from kewang 2025-03-24 13:21:05 +01:00
KeWang1017 8220546036 Enhance SAC configuration and policy with new parameters and subsampling logic
- Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig.
- Implemented target entropy calculation in SACPolicy if not provided.
- Introduced subsampling of critics to prevent overfitting during updates.
- Updated temperature loss calculation to use the new target entropy.
- Added comments for future UTD update implementation.

These changes improve the flexibility and performance of the SAC implementation.
2025-03-24 13:21:05 +01:00
KeWang 214beec994 Port SAC WIP (#581)
Co-authored-by: KeWang1017 <ke.wang@helloleap.ai>
2025-03-24 13:21:05 +01:00
Michel Aractingi 909ca8d9b6 completed losses 2025-03-24 13:21:05 +01:00
Michel Aractingi 5fe56e0a49 nit in control_robot.py 2025-03-24 13:21:05 +01:00
Yoel 0ebdae8a40 Reward classifier and training (#528)
Co-authored-by: Daniel Ritchie <daniel@brainwavecollective.ai>
Co-authored-by: resolver101757 <kelster101757@hotmail.com>
Co-authored-by: Jannik Grothusen <56967823+J4nn1K@users.noreply.github.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-03-24 13:20:43 +01:00
Pepijn e8159997c7
User/pepijn/2025 03 17 act different image shapes (#870) 2025-03-18 11:09:05 +01:00
Steven Palma 5e9473806c
refactor(config): Move device & amp args to PreTrainedConfig (#812)
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
2025-03-06 17:59:28 +01:00
Steven Palma 5d24ce3160
chore(doc): add license header to all files (#818) 2025-03-05 17:56:51 +01:00
Yachen Kang b80e55ca44
change "actions_id_pad" to "actions_is_pad"(🐛 Bug) (#774)
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-03-05 01:31:56 +01:00
Simon Alibert a1809ad3de
Add typos checks (#770) 2025-02-25 23:51:15 +01:00
Simon Alibert 3354d919fc
LeRobotDataset v2.1 (#711)
Co-authored-by: Remi <remi.cadene@huggingface.co>
Co-authored-by: Remi Cadene <re.cadene@gmail.com>
2025-02-25 15:27:29 +01:00
Simon Alibert c4c2ce04e7
Update pre-commits (#733) 2025-02-15 15:51:17 +01:00
Simon Alibert e71095960f
Fixes following #670 (#719) 2025-02-12 12:53:55 +01:00
Simon Alibert 90e099b39f
Remove offline training, refactor `train.py` and logging/checkpointing (#670)
Co-authored-by: Remi <remi.cadene@huggingface.co>
2025-02-11 10:36:06 +01:00
Remi 638d411cd3
Add Pi0 (#681)
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
2025-02-04 18:01:04 +01:00
Simon Alibert 3c0a209f9f
Simplify configs (#550)
Co-authored-by: Remi <remi.cadene@huggingface.co>
Co-authored-by: HUANG TZU-CHUN <137322177+tc-huang@users.noreply.github.com>
2025-01-31 13:57:37 +01:00
Hirokazu Ishida 538455a965
feat: enable to use multiple rgb encoders per camera in diffusion policy (#484)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-10-30 11:00:05 +01:00
Alexander Soare a60d27b132
Raise ValueError if horizon is incompatible with downsampling (#422) 2024-09-09 17:22:46 +01:00
Joe Clinton f17d9a2ba1
Bug: Fix VQ-Bet not working when n_action_pred_token=1 (#420)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-09-09 09:41:13 +01:00
Jack Vial b2896d38f5
fix(act): n_vae_encoder_layers config parameter wasn't being used (#400) 2024-09-02 18:29:27 +01:00
NielsRogge 86bbd16d43
Improve discoverability on the hub (#325)
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
2024-08-19 15:16:46 +02:00
Alexander Soare 0f6e0f6d74
Fix input dim (#365) 2024-08-19 11:42:32 +01:00
Halvard Bariller 7a3cb1ad34
Adjust the timestamps' description in Diffusion Policy (#343)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-07-26 12:47:03 +01:00
Alexander Soare f8a6574698
Add online training with TD-MPC as proof of concept (#338) 2024-07-25 11:16:38 +01:00
Alexander Soare abbb1d2367
Make sure policies don't mutate the batch (#323) 2024-07-22 20:38:33 +01:00
Alexander Soare c0101f0948
Fix ACT temporal ensembling (#319) 2024-07-16 10:27:21 +01:00
Alexander Soare 471eab3d7e
Make ACT compatible with "observation.environment_state" (#314) 2024-07-11 13:12:22 +01:00
Seungjae Lee 64425d5e00
Bug fix: fix error when setting select_target_actions_indices in vqbet (#310)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-07-10 17:56:11 +01:00
Alexander Soare cc2f6e7404
Train diffusion pusht_keypoints (#307)
Co-authored-by: Remi <re.cadene@gmail.com>
2024-07-09 12:35:50 +01:00
Simon Alibert 74362ac453
Add VQ-BeT copyrights (#299) 2024-07-04 13:02:31 +02:00
Alexander Soare 342f429f1c
Add test to make sure policy dataclass configs match yaml configs (#292) 2024-06-26 09:09:40 +01:00
Seungjae Lee 7d1542cae1
Add VQ-BeT (#166) 2024-06-26 08:55:02 +01:00
Thomas Wolf 48951662f2
Bug fix: missing attention mask in VAE encoder in ACT policy (#279)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-06-19 12:07:21 +01:00
Jihoon Oh b72d574891
fix Unet global_cond_dim to use state dim, not action dim (#278) 2024-06-17 15:17:28 +01:00
Alexander Soare 15dd682714
Add multi-image support to diffusion policy (#218) 2024-06-17 08:11:20 +01:00
Wael Karkoub 54c9776bde
Improves Type Annotations (#252) 2024-06-10 19:09:48 +01:00
Ruijie b0d954c6e1
Fix bug in normalize to avoid divide by zero (#239)
Co-authored-by: rj <rj@teleopstrio-razer.lan>
Co-authored-by: Remi <re.cadene@gmail.com>
2024-06-04 12:21:28 +02:00
Alexander Soare cf15cba5fc
Remove redundant slicing operation in Diffusion Policy (#240) 2024-06-03 13:04:24 +01:00
Remi d585c73f9f
Add real-world support for ACT on Aloha/Aloha2 (#228)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-05-31 15:31:02 +02:00
Alexander Soare 57fb5fe8a6
Improve documentation on VAE encoder inputs (#215) 2024-05-30 19:16:44 +02:00
Alexander Soare 3d625ae6d3
Handle `crop_shape=None` in Diffusion Policy (#219) 2024-05-28 18:27:33 +01:00
Radek Osmulski 3b86050ab0
throw an error if config.do_maks_loss and action_is_pad not provided in batch (#213)
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-05-27 09:06:26 +01:00
Alexander Soare 5ec0af62c6
Explain why n_encoder_layers=1 (#193) 2024-05-17 15:05:40 +01:00