Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hyperspherical Normalization for Scalable Deep Reinforcement Learning.
Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, and Jaegul Choo.
In International Conference on Machine Learning, June 2025.

Download

[PDF]1.9MB

Abstract

Scaling up the model size and computation has brought consistent performanceimprovements in supervised learning. However, this lesson often fails to apply toreinforcement learning (RL) because training the model on non-stationary dataeasily leads to overfitting and unstable optimization. In response, we introduceSimbaV2, a novel RL architecture designed to stabilize optimization by (i)constraining the growth of weight and feature norm by hypersphericalnormalization; and (ii) using a distributional value estimation with rewardscaling to maintain stable gradients under varying reward magnitudes. Using thesoft actor-critic as a base algorithm, SimbaV2 scales up effectively with largermodels and greater compute, achieving state-of-the-art performance on 57continuous control tasks across 4 domains.

BibTeX Entry

@InProceedings{pstone_simba,
  author   = {Hojoon Lee and Youngdo Lee and Takuma Seno and Donghu Kim and Peter Stone and Jaegul Choo},
  title    = {Hyperspherical Normalization for Scalable Deep Reinforcement Learning},
  booktitle = {International Conference on Machine Learning},
  year     = {2025},
  month    = {June},
  location = {Vancouver, Canada},
  abstract = {Scaling up the model size and computation has brought consistent performance
improvements in supervised learning. However, this lesson often fails to apply to
reinforcement learning (RL) because training the model on non-stationary data
easily leads to overfitting and unstable optimization. In response, we introduce
SimbaV2, a novel RL architecture designed to stabilize optimization by (i)
constraining the growth of weight and feature norm by hyperspherical
normalization; and (ii) using a distributional value estimation with reward
scaling to maintain stable gradients under varying reward magnitudes. Using the
soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger
models and greater compute, achieving state-of-the-art performance on 57
continuous control tasks across 4 domains.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Dec 10, 2025 13:48:59