- Updated action selection to use distribution sampling and log probabilities for better stochastic behavior. - Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs. - Cleaned up code by removing unnecessary comments and improving readability. These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference. |
||
---|---|---|
.. | ||
act | ||
diffusion | ||
hilserl | ||
pi0 | ||
sac | ||
tdmpc | ||
vqbet | ||
__init__.py | ||
factory.py | ||
normalize.py | ||
pretrained.py | ||
utils.py |