Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets.
Alexander Levine, Peter Stone, and and Amy Zhang.
In Reinforcement Learning Conference, August 2025.

Download

[PDF]770.2kB  

Abstract

While sequential decision-making environments often involve high-dimensionalobservations, not all features of these observations are relevant for control. Inparticular, the observation space may capture factors of the environment whichare not controllable by the agent, but which add complexity to the observationspace. The need to ignore these ""noise"" features in order to operate in atractably-small state space poses a challenge for efficient policy learning. Dueto the abundance of video data available in many such environments,task-independent representation learning from action-free offline data offers anattractive solution. However, recent work has highlighted theoretical limitationsin action-free learning under the Exogenous Block MDP (Ex-BMDP) model, wheretemporally-correlated noise features are present in the observations. To addressthese limitations, we identify a realistic setting where representation learningin Ex-BMDPs becomes tractable: when action-free video data from multiple agentswith differing policies are available. Concretely, this paper introduces CRAFT(Comparison-based Representations from Action-Free Trajectories), asample-efficient algorithm leveraging differences in controllable featuredynamics across agents to learn representations. We provide theoreticalguarantees for CRAFT's performance and demonstrate its feasibility on a toyexample, offering a foundation for practical methods in similar settings.

BibTeX Entry

@InProceedings{alex_rlc2025,
  author   = {Alexander Levine and Peter Stone and and Amy Zhang},
  title    = {Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets},
  booktitle = {Reinforcement Learning Conference},
  year     = {2025},
  month    = {August},
  location = {Edmonton, Canada},
  abstract = {While sequential decision-making environments often involve high-dimensional
observations, not all features of these observations are relevant for control. In
particular, the observation space may capture factors of the environment which
are not controllable by the agent, but which add complexity to the observation
space. The need to ignore these ""noise"" features in order to operate in a
tractably-small state space poses a challenge for efficient policy learning. Due
to the abundance of video data available in many such environments,
task-independent representation learning from action-free offline data offers an
attractive solution. However, recent work has highlighted theoretical limitations
in action-free learning under the Exogenous Block MDP (Ex-BMDP) model, where
temporally-correlated noise features are present in the observations. To address
these limitations, we identify a realistic setting where representation learning
in Ex-BMDPs becomes tractable: when action-free video data from multiple agents
with differing policies are available. Concretely, this paper introduces CRAFT
(Comparison-based Representations from Action-Free Trajectories), a
sample-efficient algorithm leveraging differences in controllable feature
dynamics across agents to learn representations. We provide theoretical
guarantees for CRAFT's performance and demonstrate its feasibility on a toy
example, offering a foundation for practical methods in similar settings.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Oct 02, 2025 22:46:25