Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


The Nature of Belief-Directed Exploratory Choice in Human Decision-Making

W. Bradley Knox, A. Ross Otto, Peter Stone, and Bradley Love. The Nature of Belief-Directed Exploratory Choice in Human Decision-Making. Frontiers in Psychology, 2(398), January 2012.
Frontiers in Psychology
Download article from publisher (free)
A follow-up commentary by Erica Yu.

Download

(unavailable)

Abstract

In non-stationary environments, there is a conflict between exploiting currently favored options and gaining information by exploring lesser-known options that in the past have proven less rewarding. Optimal decision making in such tasks requires considering future states of the environment (i.e., planning) and properly updating beliefs about the state of environment after observing outcomes associated with choices. Optimal belief-updating is reflective in that beliefs can change without directly observing environmental change. For example, after ten seconds elapse, one might correctly believe that a traffic light last observed to be red is now more likely to be green. To understand human decision-making when rewards associated with choice options change over time, we develop a variant of the classic bandit task that is both rich enough to encompass relevant phenomena and sufficiently tractable to allow for ideal actor analysis of sequential choice behavior. We evaluate whether people update beliefs about the state of environment in a reflexive (i.e., only in response to observed changes in reward structure) or reflective manner. In contrast to purely ``random'' accounts of exploratory behavior, model-based analyses of the subjects? choices and latencies indicate that people are reflective belief-updaters. However, unlike the Ideal Actor model, our analyses indicate that people's choice behavior does not reflect consideration of future environmental states. Thus, although people update beliefs in a reflective manner consistent with the ideal actor, they do not engage in optimal long-term planning, but instead myopically choose the option on every trial that is believed to have the highest immediate payoff.

BibTeX Entry

@ARTICLE{FRONTIERS12-knox,
  AUTHOR={W. Bradley Knox  and  A. Ross Otto  and  Peter Stone  and  Bradley Love},     
  TITLE={The Nature of Belief-Directed Exploratory Choice in Human Decision-Making},
  JOURNAL={Frontiers in Psychology},
  VOLUME={2},
  YEAR={2012},
  month={January},
  NUMBER={398},
  URL={http://www.frontiersin.org/Journal/Abstract.aspx?s=196&name=cognitive_science&ART_DOI=10.3389/fpsyg.2011.00398},
  DOI={10.3389/fpsyg.2011.00398},
  ISSN={1664-1078},
  ABSTRACT={In non-stationary environments, there is a conflict between exploiting currently favored options and gaining information by exploring lesser-known options that in the past have proven less rewarding. Optimal decision making in such tasks requires considering future states of the environment (i.e., planning) and properly updating beliefs about the state of environment after observing outcomes associated with choices. Optimal belief-updating is reflective in that beliefs can change without directly observing environmental change. For example, after ten seconds elapse, one might correctly believe that a traffic light last observed to be red is now more likely to be green. To understand human decision-making when rewards associated with choice options change over time, we develop a variant of the classic bandit task that is both rich enough to encompass relevant phenomena and sufficiently tractable to allow for ideal actor analysis of sequential choice behavior. We evaluate whether people update beliefs about the state of environment in a reflexive (i.e., only in response to observed changes in reward structure) or reflective manner. In contrast to purely ``random'' accounts of exploratory behavior, model-based analyses of the subjects? choices and latencies indicate that people are reflective belief-updaters. However, unlike the Ideal Actor model, our analyses indicate that people's choice behavior does not reflect consideration of future environmental states. Thus, although people update beliefs in a reflective manner consistent with the ideal actor, they do not engage in optimal long-term planning, but instead myopically choose the option on every trial that is believed to have the highest immediate payoff.},
  wwwnote={<a href="http://www.frontiersin.org/psychology">Frontiers in Psychology</a>
    <br><a href="http://www.frontiersin.org/Journal/DownloadFile.ashx?pdf=1&FileId=%204406&articleId=%2019266&Version=%201&ContentTypeId=21&FileName=%20fpsyg-02-00398.pdf">Download article from publisher (free)</a><br>A <a href="http://www.frontiersin.org/Cognitive_Science/10.3389/fpsyg.2012.00541/full">follow-up commentary</a> by Erica Yu.}
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Sep 05, 2014 12:17:33