• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Hyperspherical Normalization for Scalable Deep Reinforcement Learning.
Hojoon
Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter
Stone, and Jaegul Choo.
In International Conference on Machine Learning, May 2025.
Scaling up the model size and computation has brought consistent performanceimprovements in supervised learning. However, this lesson often fails to apply toreinforcement learning (RL) because training the model on non-stationary dataeasily leads to overfitting and unstable optimization. In response, we introduceSimbaV2, a novel RL architecture designed to stabilize optimization by (i)constraining the growth of weight and feature norm by hypersphericalnormalization; and (ii) using a distributional value estimation with rewardscaling to maintain stable gradients under varying reward magnitudes. Using thesoft actor-critic as a base algorithm, SimbaV2 scales up effectively with largermodels and greater compute, achieving state-of-the-art performance on 57continuous control tasks across 4 domains.
@InProceedings{pstone_simba, author = {Hojoon Lee and Youngdo Lee and Takuma Seno and Donghu Kim and Peter Stone and Jaegul Choo}, title = {Hyperspherical Normalization for Scalable Deep Reinforcement Learning}, booktitle = {International Conference on Machine Learning}, year = {2025}, month = {May}, location = {Vancouver, Canada}, abstract = {Scaling up the model size and computation has brought consistent performance improvements in supervised learning. However, this lesson often fails to apply to reinforcement learning (RL) because training the model on non-stationary data easily leads to overfitting and unstable optimization. In response, we introduce SimbaV2, a novel RL architecture designed to stabilize optimization by (i) constraining the growth of weight and feature norm by hyperspherical normalization; and (ii) using a distributional value estimation with reward scaling to maintain stable gradients under varying reward magnitudes. Using the soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger models and greater compute, achieving state-of-the-art performance on 57 continuous control tasks across 4 domains. }, }
Generated by bib2html.pl (written by Patrick Riley ) on Mon Sep 29, 2025 19:22:37