Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Hyperspherical Normalization for Scalable Deep Reinforcement Learning

Hyperspherical Normalization for Scalable Deep Reinforcement Learning.
Hojoon Lee, Youngdo Lee, Takuma Seno, Donghu Kim, Peter Stone, and Jaegul Choo.
In International Conference on Machine Learning, May 2025.

Download

[PDF]1.9MB  

Abstract

Scaling up the model size and computation has brought consistent performanceimprovements in supervised learning. However, this lesson often fails to apply toreinforcement learning (RL) because training the model on non-stationary dataeasily leads to overfitting and unstable optimization. In response, we introduceSimbaV2, a novel RL architecture designed to stabilize optimization by (i)constraining the growth of weight and feature norm by hypersphericalnormalization; and (ii) using a distributional value estimation with rewardscaling to maintain stable gradients under varying reward magnitudes. Using thesoft actor-critic as a base algorithm, SimbaV2 scales up effectively with largermodels and greater compute, achieving state-of-the-art performance on 57continuous control tasks across 4 domains.

BibTeX Entry

@InProceedings{pstone_simba,
  author   = {Hojoon Lee and Youngdo Lee and Takuma Seno and Donghu Kim and Peter Stone and Jaegul Choo},
  title    = {Hyperspherical Normalization for Scalable Deep Reinforcement Learning},
  booktitle = {International Conference on Machine Learning},
  year     = {2025},
  month    = {May},
  location = {Vancouver, Canada},
  abstract = {Scaling up the model size and computation has brought consistent performance
improvements in supervised learning. However, this lesson often fails to apply to
reinforcement learning (RL) because training the model on non-stationary data
easily leads to overfitting and unstable optimization. In response, we introduce
SimbaV2, a novel RL architecture designed to stabilize optimization by (i)
constraining the growth of weight and feature norm by hyperspherical
normalization; and (ii) using a distributional value estimation with reward
scaling to maintain stable gradients under varying reward magnitudes. Using the
soft actor-critic as a base algorithm, SimbaV2 scales up effectively with larger
models and greater compute, achieving state-of-the-art performance on 57
continuous control tasks across 4 domains.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Mon Sep 29, 2025 19:22:37