KeWang1017
ecb91b37eb
Refactor SACPolicy for improved action sampling and standard deviation handling
...
- Updated action selection to use distribution sampling and log probabilities for better stochastic behavior.
- Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs.
- Cleaned up code by removing unnecessary comments and improving readability.
These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.
2025-03-24 13:24:23 +01:00
KeWang1017
c89bcc5aa8
trying to get sac running
2025-03-24 13:24:23 +01:00
Michel Aractingi
cc85bca2b5
Added normalization schemes and style checks
2025-03-24 13:24:23 +01:00
Michel Aractingi
3b07766c33
added optimizer and sac to factory.py
2025-03-24 13:23:53 +01:00
Eugene Mironov
287968b418
[HIL-SERL PORT] Fix linter issues ( #588 )
2025-03-24 13:23:02 +01:00
Eugene Mironov
c9f1a037e3
[Port Hil-SERL] Add unit tests for the reward classifier & fix imports & check script ( #578 )
2025-03-24 13:23:02 +01:00
Michel Aractingi
8a7f74ee65
added comments from kewang
2025-03-24 13:21:05 +01:00
KeWang1017
8220546036
Enhance SAC configuration and policy with new parameters and subsampling logic
...
- Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig.
- Implemented target entropy calculation in SACPolicy if not provided.
- Introduced subsampling of critics to prevent overfitting during updates.
- Updated temperature loss calculation to use the new target entropy.
- Added comments for future UTD update implementation.
These changes improve the flexibility and performance of the SAC implementation.
2025-03-24 13:21:05 +01:00
KeWang
214beec994
Port SAC WIP ( #581 )
...
Co-authored-by: KeWang1017 <ke.wang@helloleap.ai>
2025-03-24 13:21:05 +01:00
Michel Aractingi
909ca8d9b6
completed losses
2025-03-24 13:21:05 +01:00
Michel Aractingi
5fe56e0a49
nit in control_robot.py
2025-03-24 13:21:05 +01:00
Yoel
0ebdae8a40
Reward classifier and training ( #528 )
...
Co-authored-by: Daniel Ritchie <daniel@brainwavecollective.ai>
Co-authored-by: resolver101757 <kelster101757@hotmail.com>
Co-authored-by: Jannik Grothusen <56967823+J4nn1K@users.noreply.github.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>
2025-03-24 13:20:43 +01:00
Pepijn
e8159997c7
User/pepijn/2025 03 17 act different image shapes ( #870 )
2025-03-18 11:09:05 +01:00
Steven Palma
5e9473806c
refactor(config): Move device & amp args to PreTrainedConfig ( #812 )
...
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
2025-03-06 17:59:28 +01:00
Steven Palma
5d24ce3160
chore(doc): add license header to all files ( #818 )
2025-03-05 17:56:51 +01:00
Yachen Kang
b80e55ca44
change "actions_id_pad" to "actions_is_pad"( 🐛 Bug) ( #774 )
...
Co-authored-by: Steven Palma <imstevenpmwork@ieee.org>
2025-03-05 01:31:56 +01:00
Simon Alibert
a1809ad3de
Add typos checks ( #770 )
2025-02-25 23:51:15 +01:00
Simon Alibert
3354d919fc
LeRobotDataset v2.1 ( #711 )
...
Co-authored-by: Remi <remi.cadene@huggingface.co>
Co-authored-by: Remi Cadene <re.cadene@gmail.com>
2025-02-25 15:27:29 +01:00
Simon Alibert
c4c2ce04e7
Update pre-commits ( #733 )
2025-02-15 15:51:17 +01:00
Simon Alibert
e71095960f
Fixes following #670 ( #719 )
2025-02-12 12:53:55 +01:00
Simon Alibert
90e099b39f
Remove offline training, refactor `train.py` and logging/checkpointing ( #670 )
...
Co-authored-by: Remi <remi.cadene@huggingface.co>
2025-02-11 10:36:06 +01:00
Remi
638d411cd3
Add Pi0 ( #681 )
...
Co-authored-by: Simon Alibert <simon.alibert@huggingface.co>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
2025-02-04 18:01:04 +01:00
Simon Alibert
3c0a209f9f
Simplify configs ( #550 )
...
Co-authored-by: Remi <remi.cadene@huggingface.co>
Co-authored-by: HUANG TZU-CHUN <137322177+tc-huang@users.noreply.github.com>
2025-01-31 13:57:37 +01:00
Hirokazu Ishida
538455a965
feat: enable to use multiple rgb encoders per camera in diffusion policy ( #484 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-10-30 11:00:05 +01:00
Alexander Soare
a60d27b132
Raise ValueError if horizon is incompatible with downsampling ( #422 )
2024-09-09 17:22:46 +01:00
Joe Clinton
f17d9a2ba1
Bug: Fix VQ-Bet not working when n_action_pred_token=1 ( #420 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-09-09 09:41:13 +01:00
Jack Vial
b2896d38f5
fix(act): n_vae_encoder_layers config parameter wasn't being used ( #400 )
2024-09-02 18:29:27 +01:00
NielsRogge
86bbd16d43
Improve discoverability on the hub ( #325 )
...
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
2024-08-19 15:16:46 +02:00
Alexander Soare
0f6e0f6d74
Fix input dim ( #365 )
2024-08-19 11:42:32 +01:00
Halvard Bariller
7a3cb1ad34
Adjust the timestamps' description in Diffusion Policy ( #343 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-07-26 12:47:03 +01:00
Alexander Soare
f8a6574698
Add online training with TD-MPC as proof of concept ( #338 )
2024-07-25 11:16:38 +01:00
Alexander Soare
abbb1d2367
Make sure policies don't mutate the batch ( #323 )
2024-07-22 20:38:33 +01:00
Alexander Soare
c0101f0948
Fix ACT temporal ensembling ( #319 )
2024-07-16 10:27:21 +01:00
Alexander Soare
471eab3d7e
Make ACT compatible with "observation.environment_state" ( #314 )
2024-07-11 13:12:22 +01:00
Seungjae Lee
64425d5e00
Bug fix: fix error when setting select_target_actions_indices in vqbet ( #310 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-07-10 17:56:11 +01:00
Alexander Soare
cc2f6e7404
Train diffusion pusht_keypoints ( #307 )
...
Co-authored-by: Remi <re.cadene@gmail.com>
2024-07-09 12:35:50 +01:00
Simon Alibert
74362ac453
Add VQ-BeT copyrights ( #299 )
2024-07-04 13:02:31 +02:00
Alexander Soare
342f429f1c
Add test to make sure policy dataclass configs match yaml configs ( #292 )
2024-06-26 09:09:40 +01:00
Seungjae Lee
7d1542cae1
Add VQ-BeT ( #166 )
2024-06-26 08:55:02 +01:00
Thomas Wolf
48951662f2
Bug fix: missing attention mask in VAE encoder in ACT policy ( #279 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-06-19 12:07:21 +01:00
Jihoon Oh
b72d574891
fix Unet global_cond_dim to use state dim, not action dim ( #278 )
2024-06-17 15:17:28 +01:00
Alexander Soare
15dd682714
Add multi-image support to diffusion policy ( #218 )
2024-06-17 08:11:20 +01:00
Wael Karkoub
54c9776bde
Improves Type Annotations ( #252 )
2024-06-10 19:09:48 +01:00
Ruijie
b0d954c6e1
Fix bug in normalize to avoid divide by zero ( #239 )
...
Co-authored-by: rj <rj@teleopstrio-razer.lan>
Co-authored-by: Remi <re.cadene@gmail.com>
2024-06-04 12:21:28 +02:00
Alexander Soare
cf15cba5fc
Remove redundant slicing operation in Diffusion Policy ( #240 )
2024-06-03 13:04:24 +01:00
Remi
d585c73f9f
Add real-world support for ACT on Aloha/Aloha2 ( #228 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-05-31 15:31:02 +02:00
Alexander Soare
57fb5fe8a6
Improve documentation on VAE encoder inputs ( #215 )
2024-05-30 19:16:44 +02:00
Alexander Soare
3d625ae6d3
Handle `crop_shape=None` in Diffusion Policy ( #219 )
2024-05-28 18:27:33 +01:00
Radek Osmulski
3b86050ab0
throw an error if config.do_maks_loss and action_is_pad not provided in batch ( #213 )
...
Co-authored-by: Alexander Soare <alexander.soare159@gmail.com>
2024-05-27 09:06:26 +01:00
Alexander Soare
5ec0af62c6
Explain why n_encoder_layers=1 ( #193 )
2024-05-17 15:05:40 +01:00