| Function Approximation |   |   | Partial Observability |   |   | Learning Methods |   |   | Ensembles |   |   | 
| Stochastic Optimisation |   |   | General RL |   |   | General ML |   |   | Multiagent Learning |   |   | 
| Comparison/Integration |   |   | Bandits |   |   | Applications |   |   | Robot Soccer |   |   | 
| Humanoids |   |   | Parameter |   |   | MDP |   |   | Empirical |   |   | 
| Failure Warning |   |   | Representation |   |   | General AI |   |   | Neural Networks |   |   | 
| All |   |   | 
 Almost Optimal Exploration in Multi-Armed Bandits
 Zohar Karnin,  Tomer Koren, and  Oren Somekh, 2013
    Details   
 Information Complexity in Bandit Subset Selection
 Emilie Kaufmann and  Shivaram Kalyanakrishnan, 2013
    Details   
 Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
 Victor Gabillon,  Mohammad Ghavamzadeh, and  Alessandro Lazaric, 2012
    Details   
 Planning in Reward-Rich Domains via PAC Bandits
 Sergiu Goschin,  Ari Weinstein,  Michael L. Littman, and  Erick Chastain, 2012
    Details   
 Learning Methods for Sequential Decision Making with Imperfect Representations
 Shivaram Kalyanakrishnan, 2011
    Details   
 Learning to Predict Humanoid Fall
 Shivaram Kalyanakrishnan and  Ambarish Goswami, 2011
    Details   
 Characterizing reinforcement learning methods through parameterized learning problems
 Shivaram Kalyanakrishnan and  Peter Stone, 2011
    Details   
 On Learning with Imperfect Representations
 Shivaram Kalyanakrishnan and  Peter Stone, 2011
    Details   
 On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot Soccer
 Daniel Urieli,  Patrick MacAlpine,  Shivaram Kalyanakrishnan,  Yinon Bentor, and  Peter Stone, 2011
    Details   
 Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning
 Shimon Whiteson,  Brian Tanner,  Matthew E. Taylor, and  Peter Stone, 2011
    Details   
 Exploiting Best-Match Equations for Efficient Reinforcement Learning
 Harm van Seijen,  Shimon Whiteson,  Hado van Hasselt, and  Marco Wiering, 2011
    Details   
 Insights in Reinforcement Learning: formal analysis and empirical evaluation of temporal-difference learning algorithms
 Hado Philip van Hasselt, 2011
    Details   
 Success, strategy and skill: an experimental study
 Christopher Archibald,  Alon Altman, and  Yoav Shoham, 2010
    Details   
 Best Arm Identification in Multi-Armed Bandits
 Jean-Yves Audibert,  Sébastien Bubeck, and  Rémi Munos, 2010
    Details   
 UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
 Peter Auer and  Ronald Ortner, 2010
    Details   
 Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
 Carlton Downey and  Scott Sanner, 2010
    Details   
 A Brief Survey of Parametric Value Function Approximation
 Matthieu Geist and  Olivier Pietquin, 2010
    Details   
 Simulation optimization using the cross-entropy method with optimal computing budget allocation
 Donghai He,  Loo Hay Lee,  Chun-Hung Chen,  Michael C. Fu, and  Segev Wasserkrug, 2010
    Details   
 An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
 Junya Honda and  Akimichi Takemura, 2010
    Details   
 Near-optimal Regret Bounds for Reinforcement Learning
 Thomas Jaksch,  Ronald Ortner, and  Peter Auer, 2010
    Details   
 Non-Stochastic Bandit Slate Problems
 Satyen Kale,  Lev Reyzin, and  Robert E. Schapire, 2010
    Details   
 Predicting Falls of a Humanoid Robot through Machine Learning
 Shivaram Kalyanakrishnan and  Ambarish Goswami, 2010
    Details   
 Three Humanoid Soccer Platforms: Comparison and Synthesis
 Shivaram Kalyanakrishnan,  Todd Hester,  Michael Quinlan,  Yinon Bentor, and  Peter Stone, 2010
    Details   
 Efficient Selection of Multiple Bandit Arms: Theory and Practice
 Shivaram Kalyanakrishnan and  Peter Stone, 2010
    Details   
 Learning Complementary Multiagent Behaviors: A Case Study
 Shivaram Kalyanakrishnan and  Peter Stone, 2010
    Details   
 Fall Detection of Two-legged Walking Robots using Multi-way Principal Components Analysis
 J. G. Daniël Karssen and  Martijn Wisse, 2010
    Details   
 Regret bounds for sleeping experts and bandits
 Robert Kleinberg,  Alexandru Niculescu-Mizil, and  Yogeshwer Sharma, 2010
    Details   
 Finite-Sample Analysis of LSTD
 Alessandro Lazaric,  Mohammad Ghavamzadeh, and  Rémi Munos, 2010
    Details   
 A contextual-bandit approach to personalized news article recommendation
 Lihong Li,  Wei Chu,  John Langford, and  Robert E. Schapire, 2010
    Details   
 Estimating Learning Rates in Evolution and TDL: Results on a Simple Grid-World Problem
 Simon M. Lucas, 2010
    Details   
 Toward Off-Policy Learning Control with Function Approximation
 Hamid Reza Maei,  Csaba Szepesvári,  Shalabh Bhatnagar, and  Richard S. Sutton, 2010
    Details   
 Biped Walk Learning Through Playback and Corrective Demonstration
 \cCetin Meri\ccli and  Manuela Veloso, 2010
    Details   
 Generalized Direction Changing Fall Control of Humanoid Robots Among Multiple Objects
 Umashankar Nagarajan and  Ambarish Goswami, 2010
    Details   
 Relative Entropy Policy Search
 Jan Peters,  Katharina Mülling, and  Yasemin Altün, 2010
    Details   
 Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
 Marek Petrik,  Gavin Taylor,  Ron Parr, and  Shlomo Zilberstein, 2010
    Details   
 Biped Walking using Coronal and Sagittal Movements based on Truncated Fourier Series
 Nima Shafii,  Luis Paulo Reis, and  Nuno Lao, 2010
    Details   
 Application of Machine Learning To Epileptic Seizure Detection
 Ali Shoeb and  John Guttag, 2010
    Details   
 Algorithms for Reinforcement Learning
 Csaba Szepesvári, 2010
    Details   
 SZ-Tetris as a Benchmark for Studying Key Problems of Reinforcement Learning
 István Szita and  Csaba Szepesvári, 2010
    Details   
 Model-based reinforcement learning with nearly tight exploration complexity bounds
 István Szita and  Csaba Szepesvári, 2010
    Details   
 Reinforcement learning of motor skills in high dimensions: A path integral approach
 Evangelos Theodorou,  Jonas Buchli, and  Stefan Schaal, 2010
    Details   
 Improvements on Learning Tetris with Cross-Entropy
 Christophe Thierry and  Bruno Scherrer, 2010
    Details   
 Building Controllers for Tetris
 Christophe Thierry and  Bruno Scherrer, 2010
    Details   
 $epsilon$-First Policies for Budget-Limited Multi-Armed Bandits
 Long Tran-Thanh,  Archie Chapman,  Enrique Munoz de Cote,  Alex Rogers, and  Nicholas R. Jennings, 2010
    Details   
 Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery
 Philip A. Warrick,  Emily F. Hamilton,  Robert E. Kearney, and  Doina Precup, 2010
    Details   
 Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning
 Shimon Whiteson,  Matthew E. Taylor, and  Peter Stone, 2010
    Details   
 Fall Detection and Management in Biped Humanoid Robots
 Javier Ruiz-del-Solar,  Javier Moya, and  Isao Parra-Tsunekawa, 2010
    Details   
 Modeling billiards games
 Christopher Archibald and  Yoav Shoham, 2009
    Details   
 Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
 Jean-Yves Audibert,  Rémi Munos, and  Csaba Szepesvári, 2009
    Details   
 On the Evolution of Artificial Tetris Players
 Amine Boumaza, 2009
    Details   
 Pure Exploration in Multi-armed Bandits Problems
 Sébastien Bubeck,  Rémi Munos, and  Gilles Stoltz, 2009
    Details   
 Combinatorial Bandits
 Nicolò Cesa-Bianchi and  Gábor Lugosi, 2009
    Details   
 The adaptive $k$-meteorologists problem and its application to structure learning and feature selection in reinforcement learning
 Carlos Diuk,  Lihong Li, and  Bethany R. Leffler, 2009
    Details   
 Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem
 Damien Ernst,  Mevludin Glavic,  Florin Capitanescu, and  Louis Wehenkel, 2009
    Details   
 The Knowledge-Gradient Policy for Correlated Normal Beliefs
 Peter Frazier,  Warren Powell, and  Savas Dayanik, 2009
    Details   
 A Case Study on Improving Defense Behavior in Soccer Simulation 2D: The NeuroHassle Approach
 Thomas Gabel,  Martin Riedmiller, and  Florian Trost, 2009
    Details   
 Computational Sustainability: Computational Methods for a Sustainable Environment, Economy, and Society
 Carla P. Gomes, 2009
    Details   
 Improving Optimistic Exploration in Model-Free Reinforcement Learning
 Marek Grze\'s and  Daniel Kudenko, 2009
    Details   
 The WEKA Data Mining Software: An Update
 Mark Hall,  Eibe Frank,  Geoffrey Holmes,  Bernhard Pfahringer,  Peter Reutemann, and  Ian H. Witten, 2009
    Details   
 The CMA Evolution Strategy: A Tutorial
 Nikolaus Hansen, 2009
    Details   
 A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion
 Nikolaus Hansen,  André S.P. Niederberger,  Lino Guzzella, and  Petros Koumoutsakos, 2009
    Details   
 Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
 Verena Heidrich-Meisner and  Christian Igel, 2009
    Details   
 Neuroevolution strategies for episodic reinforcement learning
 Verena Heidrich-Meisner and  Christian Igel, 2009
    Details   
 Probabilistic Balance Monitoring for Bipedal Robots
 O. Höhn and  W. Gerth, 2009
    Details   
 SarsaLandmark: an algorithm for learning in POMDPs with landmarks
 Michael R. James and  Satinder Singh, 2009
    Details   
 Generalized AMOC Curves For Evaluation and Improvement of Event Surveillance
 Xia Jiang,  Gregory F. Cooper, and  Daniel B. Neill, 2009
    Details   
 Feature Selection for Value Function Approximation Using Bayesian Model Selection
 Tobias Jung and  Peter Stone, 2009
    Details   
 An empirical analysis of value function-based and policy search reinforcement learning
 Shivaram Kalyanakrishnan and  Peter Stone, 2009
    Details   
 The UT Austin Villa 3D Simulation Soccer Team 2008
 Shivaram Kalyanakrishnan,  Yinon Bentor, and  Peter Stone, 2009
    Details   
 Fall detection in walking robots by multi-way principal component analysis
 J. G. Daniël Karssen and  Martijn Wisse, 2009
    Details   
 Learning motor primitives for robotics
 Jens Kober and  Jan Peters, 2009
    Details   
 Evolving Neural Networks for Strategic Decision-Making Problems
 Nate Kohl and  Risto Miikkulainen, 2009
    Details   
 Regularization and feature selection in least-squares temporal difference learning
 J. Zico Kolter and  Andrew Y. Ng, 2009
    Details   
 Automatic Parameter Optimization for a Dynamic Robot Simulation
 Tim Laue and  Matthias Hebbel, 2009
    Details   
 Learning Representation and Control in Markov Decision Processes: New Frontiers
 Sridhar Mahadevan, 2009
    Details   
 Nonparametric representation of an approximated Poincaré map for learning biped locomotion
 Jun Morimoto and  Christopher G. Atkeson, 2009
    Details   
 Reinforcement learning in the brain
 Yael Niv, 2009
    Details   
 Biasing Approximate Dynamic Programming with a Lower Discount Factor
 Marek Petrik and  Bruno Scherrer, 2009
    Details   
 Feature Discovery in Approximate Dynamic Programming
 Philippe Preux,  Sertan Girgin, and  Manuel Loth, 2009
    Details   
 Reinforcement learning for robot soccer
 Martin Riedmiller,  Thomas Gabel,  Roland Hafner, and  Sascha Lange, 2009
    Details   
 Evolving Multi-modal Behavior in NPCs
 Jacob Schrum and  Risto Miikkulainen, 2009
    Details   
 Reinforcement Learning in Finite MDPs: PAC Analysis
 Lihong Strehl, Alexander L., Li and  Michael L. Littman, 2009
    Details   
 Stochastic search using the natural gradient
 Yi Sun,  Daan Wierstra,  Tom Schaul, and  Jürgen Schmidhuber, 2009
    Details   
 Fast gradient-descent methods for temporal-difference learning with linear function approximation
 Richard S. Sutton,  Hamid Reza Maei,  Doina Precup,  Shalabh Bhatnagar,  David Silver,  Csaba Szepesvári, and  Eric Wiewiora, 2009
    Details   
 Efficient covariance matrix update for variable metric evolution strategies
 Thorsten Suttorp,  Nikolaus Hansen, and  Christian Igel, 2009
    Details   
 Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
 Michael T. Todd,  Yael Niv, and  Jonathan D. Cohen, 2009
    Details   
 Ontogenetic and Phylogenetic Reinforcement Learning
 Julian Togelius,  Tom Schaul,  Daan Wierstra,  Christian Igel,  Faustino Gomez, and  Jürgen Schmidhuber, 2009
    Details   
 Generalized Domains for Empirical Evaluations in Reinforcement Learning
 Shimon Whiteson,  Brian Tanner,  Matthew E. Taylor, and  Peter Stone, 2009
    Details   
 Designing falling motions for a humanoid soccer goalie
 Tobias Wilken,  Marcell Missura, and  Sven Behnke, 2009
    Details   
 Safe Fall: Humanoid robot fall direction change through intelligent stepping and inertia shaping
 Seung-kook Yun,  Ambarish Goswami, and  Yoshiaki Sakagami, 2009
    Details   
 CMDragons 2009 Extended Team Description
 Stefan Zickler,  James Bruce,  Joydeep Biswas,  Michael Licitra, and  Manuela Veloso, 2009
    Details   
 A Theoretical and Empirical Analysis of Expected Sarsa
 Harm van Seijen,  Hado van Hasselt,  Shimon Whiteson, and  Marco Wiering, 2009
    Details   
 Learning to fall: Designing low damage fall sequences for humanoid soccer robots
 J. Ruiz-del-Solar,  R. Palma-Amestoy,  R. Marchant,  I. Parra-Tsunekawa, and  P. Zegers, 2009
    Details   
 Incremental Natural Actor-Critic Algorithms
 Shalabh Bhatnagar,  Richard S. Sutton,  Mohammad Ghavamzadeh, and  Mark Lee, 2008
    Details   
 A Comprehensive Survey of Multiagent Reinforcement Learning
 Lucian Bu\csoniu,  Robert Babu\vska, and  Bart De Schutter, 2008
    Details   
 An empirical evaluation of supervised learning in high dimensions
 Rich Caruana,  Nikolaos Karampatziakis, and  Ainur Yessenalina, 2008
    Details   
 Efficient Simulation Budget Allocation for Selecting an Optimal Subset
 Chun-Hung Chen,  Donghai He,  Michael Fu, and  Loo Hay Lee, 2008
    Details   
 The Role of Value Systems in Decision Making
 Peter Dayan, 2008
    Details   
 Decision Theory, Reinforcement Learning, and the Brain
 Peter Dayan and  Nathaniel D. Daw, 2008
    Details   
 Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
 Gen Endo,  Jun Morimoto,  Takamitsu Matsubara,  Jun Nakanishi, and  Gordon Cheng, 2008
    Details   
 Simulation-Based Approach to General Game Playing
 Hilmar Finnsson and  Yngvi Björnsson, 2008
    Details   
 Feature Discovery in Reinforcement Learning Using Genetic Programming
 Sertan Girgin and  Philippe Preux, 2008
    Details   
 Accelerated Neural Evolution through Cooperatively Coevolved Synapses
 Faustino Gomez,  Jürgen Schmidhuber, and  Risto Miikkulainen, 2008
    Details   
 Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
 Arthur Guez,  Robert D. Vincent,  Massimo Avoli, and  Joelle Pineau, 2008
    Details   
 Similarities and differences between policy gradient methods and evolution strategies
 Verena Heidrich-Meisner and  Christian Igel, 2008
    Details   
 Evolution Strategies for Direct Policy Search
 Verena Heidrich-Meisner and  Christian Igel, 2008
    Details   
 Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem
 Verena Heidrich-Meisner and  Christian Igel, 2008
    Details   
 Temporal Difference Updating without a Learning Rate
 Marcus Hutter and  Shane Legg, 2008
    Details   
 A new perspective to the keepaway soccer: the takers
 Atil Iscen and  Umut Erogul, 2008
    Details   
 Model-Based Reinforcement Learning in a Complex Domain
 Shivaram Kalyanakrishnan,  Peter Stone, and  Yaxin Liu, 2008
    Details   
 Cross-Entropy Method for Reinforcement Learning
 Steijn Kistemaker, 2008
    Details   
 Multi-armed bandits in metric spaces
 Robert Kleinberg,  Aleksandrs Slivkins, and  Eli Upfal, 2008
    Details   
 Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
 William B. Langdon,  Riccardo Poli,  Nicholas Freitag McPhee, and  John R. Koza, 2008
    Details   
 A worst-case comparison between temporal difference and residual gradient with linear function approximation
 Lihong Li, 2008
    Details   
 An analysis of reinforcement learning with function approximation
 Francisco S. Melo,  Sean P. Meyn, and  M. Isabel Ribeiro, 2008
    Details   
 Analysis of an Evolutionary Reinforcement Learning Method in a Multiagent Domain
 Jan Hendrik Metzen,  Mark Edgington,  Yohannes Kassahun, and  Frank Kirchner, 2008
    Details   
 Empirical Bernstein stopping
 Volodymyr Mnih,  Csaba Szepesvári, and  Jean-Yves Audibert, 2008
    Details   
 Real-time selection and generation of fall damage reduction actions for humanoid robots
 Kunihiro Ogata,  Koji Terada, and  Yasuo Kuniyoshi, 2008
    Details   
 Advanced Data Mining Techniques
 David L. Olson and  Dursun Delen, 2008
    Details   
 An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
 Ronald Parr,  Lihong Li,  Gavin Taylor,  Christopher Painter-Wakefield, and  Michael L. Littman, 2008
    Details   
 Reinforcement learning of motor skills with policy gradients
 Jan Peters and  Stefan Schaal, 2008
    Details   
 Natural Actor-Critic
 Jan Peters and  Stefan Schaal, 2008
    Details   
 Sample-based Learning and Search with Permanent and Transient Memories
 David Silver,  Richard S. Sutton, and  Martin Müller, 2008
    Details   
 Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
 Richard S. Sutton,  Csaba Szepesvári,  Alborz Geramifard, and  Michael Bowling, 2008
    Details   
 The many faces of optimism: a unifying approach
 Istvan Szita and  András Lörincz, 2008
    Details   
 Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
 Gerald Tesauro,  Rajarshi Das,  Hoi Chan,  Jeffrey O. Kephart,  Charles Lefurgy,  David W. Levine, and  Freeman Rawson, 2008
    Details   
 Viability and predictive control for safe locomotion
 Pierre-Brice Wieber, 2008
    Details   
 Ensemble Algorithms in Reinforcement Learning
 Marco Wiering and  Hado van Hasselt, 2008
    Details   
 SATzilla: Portfolio-based Algorithm Selection for SAT
 Lin Xu,  Frank Hutter,  Holger H. Hoos, and  Kevin Leyton-Brown, 2008
    Details   
 Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
 Engin \.Ipek,  Onur Mutlu,  José and  Martínez, and  Rich Caruana, 2008
    Details   
 Tuning Bandit Algorithms in Stochastic Environments
 Jean-Yves Audibert,  Rémi Munos, and  Csaba Szepesvári, 2007
    Details   
 Sample Complexity of Policy Search with Known Dynamics
 Peter L. Bartlett and  Ambuj Tewari, 2007
    Details   
 Distinguishing falls from normal ADL using vertical velocity profiles
 Alan K. Bourke,  Karol J. O'Donovan, and  Gearóid M. ÓLaighin, 2007
    Details   
 An optimal planning of falling motions of a humanoid robot
 Kiyoshi Fujiwara,  Shuuji Kajita,  Kensuke Harada,  Kenji Kaneko,  Mitsuharu Morisawa,  Fumio Kanehiro,  Shinichiro Nakaoka, and  Hirohisa Hirukawa, 2007
    Details   
 Bayesian actor-critic algorithms
 Mohammad Ghavamzadeh and  Yaakov Engel, 2007
    Details   
 Bayesian Policy Gradient Algorithms
 Mohammad Ghavamzadeh and  Yaakov Engel, 2007
    Details   
 Human-Robot Interaction: A Survey
 Michael A. Goodrich and  Alan C. Schultz, 2007
    Details   
 Approximation Algorithms for Budgeted Learning Problems
 Sudipto Guha and  Kamesh Munagala, 2007
    Details   
 Learning RoboCup-Keepaway with Kernels
 Tobias Jung and  Daniel Polani, 2007
    Details   
 Batch Reinforcement Learning in a Complex Domain
 Shivaram Kalyanakrishnan and  Peter Stone, 2007
    Details   
 Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study
 Shivaram Kalyanakrishnan,  Yaxin Liu, and  Peter Stone, 2007
    Details   
 The UT Austin Villa 3D Simulation Soccer Team 2007
 Shivaram Kalyanakrishnan and  Peter Stone, 2007
    Details   
 Recent advances in ranking and selection
 Seong-Hee Kim and  Barry L. Nelson, 2007
    Details   
 Large Scale Reinforcement Learning using Q-Sarsa($łambda$) and Cascading Neural Networks
 Steffen Nissen, 2007
    Details   
 Fall detection - Principles and Methods
 N. Noury,  A. Fleury,  P. Rumeau,  A. K. Bourke,  G. ÓLaighin,  V. Rialle, and  J.E. Lundy, 2007
    Details   
 Falling Motion Control for Humanoid Robots While Walking
 Kunihiro Ogata,  Koji Terada, and  Yasuo Kuniyoshi, 2007
    Details   
 Efficient Failure Detection on Mobile Robots Using Particle Filters with Gaussian Process Proposals
 Christian Plagemann,  Dieter Fox, and  Wolfram Burgard, 2007
    Details   
 On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup
 Martin Riedmiller and  Thomas Gabel, 2007
    Details   
 Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark
 Martin Riedmiller,  Jan Peters, and  Stefan Schaal, 2007
    Details   
 Autonomous blimp control using model-free reinforcement learning in a continuous state and action space
 Axel Rottmann,  Christian Plagemann,  Peter Hilgers, and  Wolfram Burgard, 2007
    Details   
 Learning classifier systems: a survey
 Olivier Sigaud and  Stewart W. Wilson, 2007
    Details   
 Reinforcement Learning of Local Shape in the Game of Go
 David Silver,  Richard S. Sutton, and  Martin Müller, 2007
    Details   
 On the role of tracking in stationary environments
 Richard S. Sutton,  Anna Koop, and  David Silver, 2007
    Details   
 Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man
 István Szita and  András L\Horincz, 2007
    Details   
 Representation Transfer for Reinforcement Learning
 Matthew E. Taylor and  Peter Stone, 2007
    Details   
 Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
 Matthew E. Taylor,  Peter Stone, and  Yaxin Liu, 2007
    Details   
 On the use of hybrid reinforcement learning for autonomic resource allocation
 Gerald Tesauro,  Nicholas K. Jong,  Rajarshi Das, and  Mohamed N. Bennani, 2007
    Details   
 Adaptive Representations for Reinforcement Learning
 Shimon Azariah Whiteson, 2007
    Details   
 Piecewise-Linear Pattern Generator and Reflex System for Humanoid Robots
 Riadh Zaier and  Shinji Kanda, 2007
    Details   
 See, walk, and kick: Humanoid robots start to play soccer
 Sven Behnke,  Michael Schreiber,  Jörg Stückler,  Reimund Renner, and  Hauke Strasdat, 2006
    Details   
 Pattern Recognition and Machine Learning
 Christopher M. Bishop, 2006
    Details   
 An empirical comparison of supervised learning algorithms
 Rich Caruana and  Alexandru Niculescu-Mizil, 2006
    Details   
 Learning the structure of Factored Markov Decision Processes in reinforcement learning problems
 Thomas Degris,  Olivier Sigaud, and  Pierre-Henri Wuillemin, 2006
    Details   
 Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
 Eyal Even-Dar,  Shie Mannor, and  Yishay Mansour, 2006
    Details   
 Tetris: A Study of Randomized Constraint Sampling
 Vivek F. Farias and  Benjamin Van Roy, 2006
    Details   
 Towards an Optimal Falling Motion for a Humanoid Robot
 Kiyoshi Fujiwara,  Shuuji Kajita,  Kensuke Harada,  Kenji Kaneko,  Mitsuharu Morisawa,  Fumio Kanehiro,  Shinichiro Nakaoka, and  Hirohisa Hirukawa, 2006
    Details   
 Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
 Abraham P. George and  Warren B. Powell, 2006
    Details   
 Hierarchical multi-agent reinforcement learning
 Mohammad Ghavamzadeh,  Sridhar Mahadevan, and  Rajbala Makar, 2006
    Details   
 Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot
 Kentarou Hitomi,  Tomohiro Shibata,  Yutaka Nakamura, and  Shin Ishii, 2006
    Details   
 An Overview of Cooperative and Competitive Multiagent Learning
 Pieter Jan't Hoen,  Karl Tuyls,  Liviu Panait,  Sean Luke, and  Johannes A. La Poutré, 2006
    Details   
 Looping suffix tree-based inference of partially observable hidden state
 Michael P. Holmes and  Charles Lee Isbell, Jr, 2006
    Details   
 Bandit Based Monte-Carlo Planning
 Levente Kocsis and  Csaba Szepesvári, 2006
    Details   
 Evolving a Real-World Vehicle Warning System
 Nate Kohl,  Kenneth Stanley,  Risto Miikkulainen,  Michael Samples, and  Rini Sherony, 2006
    Details   
 Stepping Motion for a Human-like Character to Maintain Balance against Large Perturbations
 Shunsuke Kudoh,  Taku Komura, and  Katsushi Ikeuchi, 2006
    Details   
 Quadruped Robot Obstacle Negotiation via Reinforcement Learning
 Honglak Lee,  Yirong Shen,  Chih-Han Yu,  Gurjeet Singh, and  Andrew Y. Ng, 2006
    Details   
 Relaxed fault detection and isolation: An application to a nonlinear case study
 Raffaella Mattone and  Alessandro De Luca, 2006
    Details   
 Reinforcement learning for optimized trade execution
 Yuriy Nevmyvaka,  Yi Feng, and  Michael Kearns, 2006
    Details   
 Balance Control of a Humanoid Robot Based on the Reaction Null Space Method
 Akinori Nishio,  Kentaro Takahashi, and  Dragomir N. Nenchev, 2006
    Details   
 Anytime Point-Based Approximations for Large POMDPs
 Joelle Pineau,  Geoffrey J. Gordon, and  Sebastian Thrun, 2006
    Details   
 Capture Point: A Step toward Humanoid Push Recovery
 Jerry Pratt,  John Carff,  Sergey Drakunov, and  Ambarish Goswami, 2006
    Details   
 Instability Detection and Fall Avoidance for a Humanoid using Attitude Sensors and Reflexes
 Reimund Renner and  Sven Behnke, 2006
    Details   
 Integrating Techniques from Statistical Ranking into Evolutionary Algorithms
 Christian Schmidt,  Jürgen Branke, and  Stephen E. Chick, 2006
    Details   
 Keepaway Soccer:  From Machine Learning Testbed to Benchmark
 Peter Stone,  Gregory Kuhlmann,  Matthew E. Taylor, and  Yaxin Liu, 2006
    Details   
 PAC model-free reinforcement learning
 Alexander L. Strehl,  Lihong Li,  Eric Wiewiora,  John Langford, and  Michael L. Littman, 2006
    Details   
 Getting Back on Two Feet: Reliable Standing-up Routines for a Humanoid Robot
 Jörg Stückler,  Johannes Schwenk, and  Sven Behnke, 2006
    Details   
 Learning Tetris using the noisy cross-entropy method
 István Szita and  András L\Horincz, 2006
    Details   
 Evolutionary Function Approximation for Reinforcement Learning
 Shimon Whiteson and  Peter Stone, 2006
    Details   
 On-line evolutionary computation for reinforcement learning in stochastic domains
 Shimon Whiteson and  Peter Stone, 2006
    Details   
 An Evolutionary Approach to Tetris
 Niko Böhm,  Gabriella Kókai, and  Stefan Mandl, 2005
    Details   
 An Adaptive Sampling Algorithm for Solving Markov Decision Processes
 Hyeong Soo Chang,  Michael C. Fu,  Jiaqiao Hu, and  Steven I. Marcus, 2005
    Details   
 Tree-Based Batch Mode Reinforcement Learning
 Damien Ernst,  Pierre Geurts, and  Louis Wehenkel, 2005
    Details   
 Sensory reflex control for humanoid walking
 Qiang Huang and  Yoshihiko Nakamura, 2005
    Details   
 Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
 Terran Lane and  William D. Smart, 2005
    Details   
 Evolving a Neural Network Location Evaluator to Play Ms. Pac-Man
 Simon M. Lucas, 2005
    Details   
 Basis Function Adaptation in Temporal Difference Reinforcement Learning
 Ishai Menache,  Shie Mannor, and  Nahum Shimkin, 2005
    Details   
 Spark - A generic simulator for physical multi-agent simulations
 Oliver Obst and  Markus Rollman, 2005
    Details   
 Cooperative Multi-Agent Learning: The State of the Art
 Liviu Panait and  Sean Luke, 2005
    Details   
 Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
 Martin Riedmiller, 2005
    Details   
 Function Approximation via Tile Coding: Automating Parameter Choice
 Alexander A. Sherstov and  Peter Stone, 2005
    Details   
 Reinforcement Learning for RoboCup-Soccer Keepaway
 Peter Stone,  Richard S. Sutton, and  Gregory Kuhlmann, 2005
    Details   
 A theoretical analysis of Model-Based Interval Estimation
 Alexander L. Strehl and  Michael L. Littman, 2005
    Details   
 Zero-Moment Point - Thirty Five Years of its Life
 Miomir Vukobratović and  Branislav Borovac, 2005
    Details   
 Evolving Soccer Keepaway Players Through Task Decomposition
 Shimon Whiteson,  Nate Kohl,  Risto Miikkulainen, and  Peter Stone, 2005
    Details   
 Data Mining: Practical machine learning tools and techniques
 Ian H. Witten and  Eibe Frank, 2005
    Details   
 A Tutorial on the Cross-Entropy Method
 Pieter-Tjerk de Boer,  Dirk P. Kroese,  Shie Mannor, and  Reuven Y. Rubinstein, 2005
    Details   
 Sequential Sampling in Noisy Environments
 Jürgen Branke and  Christian Schmidt, 2004
    Details   
 Tetris is hard, even to approximate
 Ron Breukelaar,  Erik D. Demaine,  Susan Hohenberger,  Hendrik Jan Hoogeboom,  Walter A. Kosters, and  David Liben-Nowell, 2004
    Details   
 Failure diagnosis using decision trees
 Mike Chen,  Alice X. Zheng,  Jim Lloyd,  Michael I. Jordan, and  Eric Brewer, 2004
    Details   
 Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis
 Claudia V. Goldman and  Shlomo Zilberstein, 2004
    Details   
 Machine Learning for Fast Quadrupedal Locomotion
 Nate Kohl and  Peter Stone, 2004
    Details   
 Sparse cooperative Q-learning
 Jelle R. Kok and  Nikos Vlassis, 2004
    Details   
 Reinforcement learning for sensing strategies
 Cody Kwok and  Dieter Fox, 2004
    Details   
 Distinctive Image Features from Scale-Invariant Keypoints
 David G. Lowe, 2004
    Details   
 Active Model Selection
 Omid Madani,  Daniel J. Lizotte, and  Russell Greiner, 2004
    Details   
 The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
 Shie Mannor and  John N. Tsitsiklis, 2004
    Details   
 Convergence of synchronous reinforcement learning with linear function approximation
 Artur Merke and  Ralf Schoknecht, 2004
    Details   
 Webots$^TM$: Professional Mobile Robot Simulation
 Olivier Michel, 2004
    Details   
 Autonomous Helicopter Flight via Reinforcement Learning
 Andrew Y. Ng,  H. Jin Kim,  Michael I. Jordan, and  Shankar Sastry, 2004
    Details   
 On the Numeric Stability of Gaussian Processes Regression for Relational Reinforcement Learning
 Jan Ramon and  Kurt Driessens, 2004
    Details   
 Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning
 Bohdana Ratitch and  Doina Precup, 2004
    Details   
 Multi-Agent Patrolling with Reinforcement Learning
 Hugo Santana,  Geber Ramalho,  Vincent Corruble, and  Bohdana Ratitch, 2004
    Details   
 Temporal difference models describe higher-order learning in humans
 Ben Seymour,  John P. O'Doherty,  Peter Dayan,  Martin Koltzenburg,  Anthony K. Jones,  Raymond J. Dolan,  Karl J. Friston, and  Richard S. Frackowiak, 2004
    Details   
 Efficient Evolution of Neural Networks Through Complexification
 Kenneth Owen Stanley, 2004
    Details   
 Stochastic policy gradient reinforcement learning on a simple 3D biped
 Russ Tedrake,  Teresa Weirui Zhang, and  H. Sebastian Seung, 2004
    Details   
 GenSo-EWS: a novel neural-fuzzy based early warning system for predicting bank failures
 W. L. Tung,  C. Quek, and  P. Cheng, 2004
    Details   
 Adaptive Job Routing and Scheduling
 Shimon Whiteson and  Peter Stone, 2004
    Details   
 A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations
 Bram Bakker,  Viktor Zhumatiy,  Gabriel Gruener, and  Jürgen Schmidhuber, 2003
    Details   
 Using Ranking and Selection to Clean Up after Simulation Optimization
 Justin Boesel,  Barry L. Nelson, and  Seong-Hee Kim, 2003
    Details   
 R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
 Ronen I. Brafman and  Moshe Tennenholtz, 2003
    Details   
 Users Manual: RoboCup Soccer Server --- for Soccer Server Version 7.07 and Later
 Mao Chen,  Klaus Dorer,  Ehsan Foroughi,  Fredrick Heintz,  ZhanXiang Huang,  Spiros Kapetanakis,  Kostas Kostiadis,  Johan Kummeneje,  Jan Murray,  Itsuki Noda,  Oliver Obst,  Pat Riley,  Timo Steffens,  Yi Wang, and  Xiang Yin, 2003
    Details   
 SPEEDY: A Fall Detector in a Wrist Watch
 Thomas Degen,  Heinz Jaeckel,  Michael Rufer, and  Stefan Wyss, 2003
    Details   
 Learning to play Pac-Man: An Evolutionary, Rule-based Approach
 Marcus Gallagher and  Amanda Ryan, 2003
    Details   
 Active Guidance for a Finless Rocket Using Neuroevolution
 Faustino J. Gomez and  Risto Miikkulainen, 2003
    Details   
 Biped walking pattern generation by a simple three-dimensional inverted pendulum model
 Shuuji Kajita,  Fumio Kanehiro,  Kenji Kaneko,  Kiyoshi Fujiwara,  Kazuhito Yokoi, and  Hirohisa Hirukawa, 2003
    Details   
 Survey of Intelligent Control Techniques for Humanoid Robots
 Du\vsko Katić and  Miomir Vukobratović, 2003
    Details   
 On Actor-Critic Algorithms
 Vijay R. Konda and  John N. Tsitsiklis, 2003
    Details   
 Least-Squares Policy Iteration
 Michail G. Lagoudakis and  Ronald Parr, 2003
    Details   
 Reinforcement Learning as Classification: Leveraging Modern Classifiers
 Michail G. Lagoudakis and  Ronald Parr, 2003
    Details   
 Boosting as a Metaphor for Algorithm Design
 Kevin Leyton-Brown,  Eugene Nudelman,  Galen Andrew,  Jim McFadden, and  Yoav Shoham, 2003
    Details   
 Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem
 Shie Mannor and  John N. Tsitsiklis, 2003
    Details   
 Least Squares Policy Evaluation Algorithms with Linear Function Approximation
 A. Nedić and  D. P. Bertsekas, 2003
    Details   
 A Convergent Form of Approximate Policy Iteration
 Theodore J. Perkins and  Doina Precup, 2003
    Details   
 Using MDP Characteristics to Guide Exploration in Reinforcement Learning
 Bohdana Ratitch and  Doina Precup, 2003
    Details   
 Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
 Ralf Schoknecht, 2003
    Details   
 An Agent that Learns to Play Pacman
 Donald Shepherd, 2003
    Details   
 Introduction to Stochastic Search and Optimization
 James C. Spall, 2003
    Details   
 Monitoring and early warning for Internet worms
 Cliff Changchun Zou,  Lixin Gao,  Weibo Gong, and  Don Towsley, 2003
    Details   
 Scaling Internal-State Policy-Gradient Methods for POMDPs
 Douglas Aberdeen and  Jonathan Baxter, 2002
    Details   
 Finite-time Analysis of the Multiarmed Bandit Problem
 Peter Auer,  Nicolò Cesa-Bianchi, and  Paul Fischer, 2002
    Details   
 The Nonstochastic Multiarmed Bandit Problem
 Peter Auer,  Nicolò Cesa-Bianchi,  Yoav Freund, and  Robert E. Schapire, 2002
    Details   
 Threshold selection, hypothesis tests, and DOE methods
 Thomas Beielstein and  Sandor Markon, 2002
    Details   
 The Complexity of Decentralized Control of Markov Decision Processes
 Daniel S. Bernstein,  Robert Givan,  Neil Immerman, and  Shlomo Zilberstein, 2002
    Details   
 An $epsilon$-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
 Blai Bonet, 2002
    Details   
 Technical Update: Least-Squares Temporal Difference Learning
 Justin A. Boyan, 2002
    Details   
 Deep Blue
 Murray Campbell,  A. Joseph Hoane Jr., and  Feng-hsiung Hsu, 2002
    Details   
 PAC Bounds for Multi-armed Bandit and Markov Decision Processes
 Eyal Even-Dar,  Shie Mannor, and  Yishay Mansour, 2002
    Details   
 Optimization for simulation: Theory vs. Practice
 Michael C. Fu, 2002
    Details   
 UKEMI: Falling motion control to minimize damage to biped humanoid robot
 Kiyoshi Fujiwara,  Fumio Kanehiro,  Shuji Kajita,  Kenji Kaneko,  Kazuhito Yokoi, and  Hirohisa Hirukawa, 2002
    Details   
 Coordinated Reinforcement Learning
 Carlos Guestrin,  Michail G. Lagoudakis, and  Ronald Parr, 2002
    Details   
 Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations
 Shanti S. Gupta and  S. Panchapakesan, 2002
    Details   
 Mining complex models from arbitrarily large databases in constant time
 Geoff Hulten and  Pedro Domingos, 2002
    Details   
 Discriminative, Generative and Imitative learning
 Tony Jebara, 2002
    Details   
 Approximately Optimal Approximate Reinforcement Learning
 Sham Kakade and  John Langford, 2002
    Details   
 Near-Optimal Reinforcement Learning in Polynomial Time
 Michael Kearns and  Satinder Singh, 2002
    Details   
 A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
 Michael Kearns,  Yishay Mansour, and  Andrew Y. Ng, 2002
    Details   
 Least-Squares Methods in Reinforcement Learning for Control
 Michail G. Lagoudakis,  Ronald Parr, and  Michael L. Littman, 2002
    Details   
 Variable Resolution Discretization in Optimal Control
 Rémi Munos and  Andrew Moore, 2002
    Details   
 Balance control analysis of humanoid robot based on ZMP feedback control
 Napoleon,  Shigeki Nakaura, and  Mitsuji Sampei, 2002
    Details   
 Kernel-Based Reinforcement Learning
 Dirk Ormoneit and  Śaunak Sen, 2002
    Details   
 On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
 Theodore J. Perkins and  Mark D. Pendrith, 2002
    Details   
 Reinforcement Learning for POMDPs Based on Action Values and Stochastic Optimization
 Theodore J. Perkins, 2002
    Details   
 Learning from Scarce Experience
 Leonid Peshkin and  Christian R. Shelton, 2002
    Details   
 Characterizing Markov Decision Processes
 Bohdana Ratitch and  Doina Precup, 2002
    Details   
 The intelligent ASIMO: system overview and integration
 Yoshiaki Sakagami,  Ryujin Watanabe,  Chiaki Aoyama,  Shinichi Matsunaga,  Nobuo Higaki, and  Kikuo Fujimura, 2002
    Details   
 A Perspective View and Survey of Meta-Learning
 Ricardo Vilalta and  Youssef Drissi, 2002
    Details   
 On the stability of walking systems
 Pierre-Brice Wieber, 2002
    Details   
 Evolution strategies in noisy environments- a survey of existing work
 D. V. Arnold, 2001
    Details   
 Scaling to Very Very Large Corpora for Natural Language Disambiguation
 Michele Banko and  Eric Brill, 2001
    Details   
 Infinite-Horizon Policy-Gradient Estimation
 Jonathan Baxter and  Peter L. Bartlett, 2001
    Details   
 Random Forests
 Leo Breiman, 2001
    Details   
 Batch Value Function Approximation via Support Vectors
 Thomas G. Dietterich and  Xin Wang, 2001
    Details   
 Convergence of Optimistic and Incremental Q-Learning
 Eyal Even-Dar and  Yishay Mansour, 2001
    Details   
 Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State
 Matthew R. Glickman and  Katia Sycara, 2001
    Details   
 Algorithm portfolios
 Carla P. Gomes and  Bart Selman, 2001
    Details   
 Max-norm Projections for Factored MDPs
 Carlos Guestrin,  Daphne Koller, and  Ronald Parr, 2001
    Details   
 Multiagent Planning with Factored MDPs
 Carlos Guestrin,  Daphne Koller, and  Ronald Parr, 2001
    Details   
 AutoBalancer: An Online Dynamic Balance Compensation Scheme for Humanoid Robots
 Satoshi Kagami,  Fumio Kanehiro,  Yukiharu Tamiya,  Masayuki Inaba, and  Hirochika Inoue, 2001
    Details   
 A Natural Policy Gradient
 Sham Kakade, 2001
    Details   
 A fully sequential procedure for indifference-zone selection in simulation
 Seong-Hee Kim and  Barry L. Nelson, 2001
    Details   
 Thresholding - a selection operator for noisy ES
 Sandor Markon,  Dirk V. Arnold,  Thomas Bäck,  Thomas Beielstein, and  Hans-Georg Beyer, 2001
    Details   
 Learning to trade via direct reinforcement
 John Moody and  Matthew Saffell, 2001
    Details   
 On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
 Andrew Y. Ng and  Michael I. Jordan, 2001
    Details   
 Off-Policy Temporal Difference Learning with Function Approximation
 Doina Precup,  Richard S. Sutton, and  Sanjoy Dasgupta, 2001
    Details   
 On the Convergence of Temporal-Difference Learning with Linear Function Approximation
 Vladislav Tadić, 2001
    Details   
 Reinforcement Learning in POMDP's via Direct Gradient Ascent
 Jonathan Baxter and  Peter L. Bartlett, 2000
    Details   
 Evolutionary algorithms in noisy environments: theoretical issues and guidelines for practice
 Hans-Georg Beyer, 2000
    Details   
 Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
 Thomas G. Dietterich, 2000
    Details   
 Mining high-speed data streams
 Pedro Domingos and  Geoff Hulten, 2000
    Details   
 Planning treatment of ischemic heart disease with partially observable Markov decision processes
 Milos Hauskrecht and  Hamish Fraser, 2000
    Details   
 Value-Function Approximations for Partially Observable Markov Decision Processes
 Milos Hauskrecht, 2000
    Details   
 Local Search Algorithms for SAT: An Empirical Evaluation
 Holger H. Hoos and  Thomas Stützle, 2000
    Details   
 Policy Iteration for Factored MDPs
 Daphne Koller and  Ronald Parr, 2000
    Details   
 Policy Search via Density Estimation
 Andrew Y. Ng,  Ronald Parr, and  Daphne Koller, 2000
    Details   
 PEGASUS: A policy search method for large MDPs and POMDPs
 Andrew Y. Ng and  Michael Jordan, 2000
    Details   
 Meta-Learning by Landmarking Various Learning Algorithms
 Bernhard Pfahringer,  Hilan Bensusan, and  Christophe Giraud-Carrier, 2000
    Details   
 Exploiting Inherent Robustness and Natural Dynamics in the Control of Bipedal Walking Robots
 Jerry E. Pratt, 2000
    Details   
 Eligibility Traces for Off-Policy Policy Evaluation
 Doina Precup,  Richard S. Sutton, and  Satinder P. Singh, 2000
    Details   
 Optimization of Noisy Fitness Functions by Means of Genetic Algorithms Using History of Search
 Yasuhito Sano and  Hajime Kita, 2000
    Details   
 Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
 Satinder Singh,  Tommi Jaakkola,  Michael L. Littman, and  Csaba Szepesvári, 2000
    Details   
 Policy Gradient Methods for Reinforcement Learning with Function Approximation
 Richard S. Sutton,  David A. McAllester,  Satinder P. Singh, and  Yishay Mansour, 2000
    Details   
 Monte Carlo POMDPs
 Sebastian Thrun, 2000
    Details   
 Gradient Descent for General Reinforcement Learning
 Leemon Baird and  Andrew Moore, 1999
    Details   
 An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
 Eric Bauer and  Ron Kohavi, 1999
    Details   
 Reinforcement Learning for Control of Self-Similar Call Traffic in Broadband Networks
 Jakob Carlström and  Ernst Nordström, 1999
    Details   
 Activity Monitoring: Noticing Interesting Changes in Behavior
 Tom Fawcett and  Foster Provost, 1999
    Details   
 Selecting and Ordering Populations: A New Statistical Methodology
 Jean Dickinson Gibbons,  Ingram Olkin, and  Milton Sobel, 1999
    Details   
 Solving Non-Markovian Control Tasks with Neuro-Evolution
 Faustino J. Gomez and  Risto Miikkulainen, 1999
    Details   
 An empirical evaluation of several methods to select the best system
 Koichiro Inoue,  Stephen E. Chick, and  Chun-Hung Chen, 1999
    Details   
 Evolutionary Algorithms for Reinforcement Learning
 David E. Moriarty,  Alan C. Schultz, and  John J. Grefenstette, 1999
    Details   
 Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
 Andrew Y. Ng,  Daishi Harada, and  Stuart J. Russell, 1999
    Details   
 Convergence of Reinforcement Learning With General Function Approximators
 Vassilis A. Papavassiliou and  Stuart Russell, 1999
    Details   
 Reinforcement Learning Using Approximate Belief States
 Andrés Rodríguez,  Ronald Parr, and  Daphne Koller, 1999
    Details   
 Distributed Value Functions
 Jeff Schneider,  Weng-Keen Wong,  Andrew Moore, and  Martin Riedmiller, 1999
    Details   
 Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
 Richard S. Sutton,  Doina Precup, and  Satinder P. Singh, 1999
    Details   
 On-Line New Event Detection and Tracking
 James Allan,  Ron Papka, and  Victor Lavrenko, 1998
    Details   
 Learning hierarchical control structures for multiple tasks and changing environments
 Bruce L. Digney, 1998
    Details   
 Robot Shaping: An Experiment in Behavior Engineering
 Marco Dorigo and  Marco Colombetti, 1998
    Details   
 Neural Networks: A Comprehensive Foundation
 Simon Haykin, 1998
    Details   
 Symposium on Applications of Reinforcement Learning: Final Report for NSF Grant IIS-9810208
 Pat Langley and  Mark Pendrith, 1998
    Details   
 Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
 John Loch and  Satinder Singh, 1998
    Details   
 Q2: Memory-Based Active Learning for Optimizing Noisy Continuous Functions
 Andrew W. Moore,  Jeff G. Schneider,  Justin A. Boyan, and  Mary S. Lee, 1998
    Details   
 Hierarchical Control and Learning for Markov Decision Processes
 Ronald Edward Parr, 1998
    Details   
 An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
 Mark D. Pendrith and  Michael J. McGarity, 1998
    Details   
 Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
 Jette Randløv and  Preben Alstrøm, 1998
    Details   
 Averaging Efficiently in the Presence of Noise
 Peter Stagge, 1998
    Details   
 Layered Learning in Multi-Agent Systems
 Peter Stone, 1998
    Details   
 Reinforcement Learning: An Introduction
 Richard S. Sutton and  Andrew G. Barto, 1998
    Details   
 Learning and Value Function Approximation in Complex Decision Processes
 Benjamin Van Roy, 1998
    Details   
 A Comparison of Direct and Model-Based Reinforcement Learning
 Christopher G. Atkeson and  Juan Carlos Santamar\'ia, 1997
    Details   
 How to Lose at Tetris
 Heidi Burgiel, 1997
    Details   
 Multitask Learning
 Rich Caruana, 1997
    Details   
 Machine-Learning Research: Four Current Directions
 Thomas G. Dietterich, 1997
    Details   
 The Racing Algorithm: Model Selection for Lazy Learners
 Oded Maron and  Andrew W. Moore, 1997
    Details   
 Reinforcement Learning in the Multi-Robot Domain
 Maja J. Matarić, 1997
    Details   
 Alarm effectiveness in driver-centred collision-warning systems
 R. Parasuraman,  P. A. Hancock, and  O. Olofinboba, 1997
    Details   
 Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
 Satinder Singh and  Dimitri Bertsekas, 1997
    Details   
 An analysis of temporal-difference learning with function approximation
 John N. Tsitsiklis and  Benjamin Van Roy, 1997
    Details   
 No free lunch theorems for optimization
 David H. Wolpert and  William G. Macready, 1997
    Details   
 Exponentially many local minima for single neurons
 Peter Auer,  Mark Herbster, and  Manfred K. Warmuth, 1996
    Details   
 Neuro-Dynamic Programming
 Dimitri P. Bertsekas and  John N. Tsitsiklis, 1996
    Details   
 Linear Least-Squares Algorithms for Temporal Difference Learning
 Steven J. Bradtke and  Andrew G. Barto, 1996
    Details   
 Improving Elevator Performance Using Reinforcement Learning
 Robert H. Crites and  Andrew G. Barto, 1996
    Details   
 Experiments with a New Boosting Algorithm
 Yoav Freund and  Robert E. Schapire, 1996
    Details   
 Stable Fitted Reinforcement Learning
 Geoffrey J. Gordon, 1996
    Details   
 Simulated Annealing for noisy cost functions
 Walter J. Gutjahr and  Georg Ch. Pflug, 1996
    Details   
 Reinforcement Learning with Selective Perception and Hidden State
 Andrew Kachites McCallum, 1996
    Details   
 Genetic Algorithms, Selection Schemes, and the Varying Effects of Noise
 Brad L. Miller and  David E. Goldberg, 1996
    Details   
 Memory-based Stochastic Optimization
 Andrew W. Moore and  Jeff Schneider, 1996
    Details   
 Incremental Multi-Step Q-Learning
 Jing Peng and  Ronald J. Williams, 1996
    Details   
 Bagging, Boosting, and C4.5
 J. Ross Quinlan, 1996
    Details   
 Evolution-Based Discovery of Hierarchical Behaviors
 Justinian P. Rosca and  Dana H. Ballard, 1996
    Details   
 Reinforcement learning with replacing eligibility traces
 Satinder P. Singh and  Richard S. Sutton, 1996
    Details   
 Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
 Richard S. Sutton, 1996
    Details   
 Feature-based methods for large scale dynamic programming
 John N. Tsitsiklis and  Benjamin Van Roy, 1996
    Details   
 Residual Algorithms: Reinforcement Learning with Function Approximation
 Leemon Baird, 1995
    Details   
 Design and analysis of experiments for statistical selection, screening, and multiple comparisons
 Robert E. Bechhofer,  Thomas J. Santner, and  David M. Goldsman, 1995
    Details   
 A Counterexample to Temporal Differences Learning
 Dimitri P. Bertsekas, 1995
    Details   
 Generalization in Reinforcement Learning: Safely Approximating the Value Function
 Justin A. Boyan and  Andrew W. Moore, 1995
    Details   
 Recursive Automatic Bias Selection for Classifier Construction
 Carla E. Brodley, 1995
    Details   
 Stable Function Approximation in Dynamic Programming
 Geoffrey J. Gordon, 1995
    Details   
 Evaluation and Selection of Biases in Machine Learning
 Diana F. Gordon and  Marie desJardins, 1995
    Details   
 Strongly Typed Genetic Programming in Evolving Cooperation Strategies
 Thomas Haynes,  Roger L. Wainwright,  Sandip Sen, and  Dale A. Schoenefeld, 1995
    Details   
 Reinforcement Learning Algorithm for Partially Observable Markov Problems
 Tommi Jaakkola,  Satinder P. Singh, and  Michael I. Jordan, 1995
    Details   
 Applications of machine learning and rule induction
 Pat Langley and  Herbert A. Simon, 1995
    Details   
 On the Complexity of Solving Markov Decision Problems
 Michael L. Littman,  Thomas L. Dean, and  Leslie Pack Kaelbling, 1995
    Details   
 Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State
 R. Andrew McCallum, 1995
    Details   
 The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces
 Andrew W. Moore and  Christopher G. Atkeson, 1995
    Details   
 Approximating Optimal Policies for Partially Observable Stochastic Domains
 Ronald Parr and  Stuart Russell, 1995
    Details   
 Methods for Competitive Co-Evolution: Finding Opponents Worth Beating
 Christopher D. Rosin and  Richard K. Belew, 1995
    Details   
 Problem Solving with Reinforcement Learning
 Gavin Adrian Rummery, 1995
    Details   
 Sequential PAC Learning
 Dale Schuurmans and  Russell Greiner, 1995
    Details   
 Artificial Intelligence: An Empirical Science
 Herbert A. Simon, 1995
    Details   
 Reinforcement Learning with Soft State Aggregation
 Satinder P. Singh,  Tommi Jaakkola, and  Michael I. Jordan, 1995
    Details   
 A Reinforcement Learning Approach to job-shop Scheduling
 Wei Zhang and  Thomas G. Dietterich, 1995
    Details   
 Acting optimally in partially observable stochastic domains
 Anthony R. Cassandra,  Leslie Pack Kaelbling, and  Michael L. Littman, 1994
    Details   
 Using a Genetic Algorithm to Search for the Representational Bias of a Collective Reinforcement Learner
 Helen G. Cobb and  Peter Bock, 1994
    Details   
 TD($łambda$) Converges with Probability 1
 Peter Dayan and  Terrence J. Sejnowski, 1994
    Details   
 An Introduction to Computational Learning Theory
 Michael J. Kearns and  Umesh V. Vazirani, 1994
    Details   
 Markov Decision Processes
 Martin L. Puterman, 1994
    Details   
 On-line Q-learning using connectionist systems
 G. A. Rummery and  M. Niranjan, 1994
    Details   
 Learning Without State-Estimation in Partially Observable Markovian Decision Processes
 Satinder P. Singh,  Tommi Jaakkola, and  Michael I. Jordan, 1994
    Details   
 An Upper Bound on the Loss from Approximate Optimal-Value Functions
 Satinder P. Singh and  Richard C. Yee, 1994
    Details   
 On bias and step size in temporal-difference learning
 Richard S. Sutton and  Satinder P. Singh, 1994
    Details   
 Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
 Ronald J. Williams and  Leemon C. Baird III, 1994
    Details   
 Reinforcement Learning Applied to Linear Quadratic Regulation
 Steven J. Bradtke, 1993
    Details   
 Benchmarks, Test Beds, Controlled Experimentation, and the Design of Agent Architectures
 Steve Hanks,  Martha E. Pollack, and  Paul R. Cohen, 1993
    Details   
 Reinforcement learning with hidden states
 Long-Ji Lin and  Tom M. Mitchell, 1993
    Details   
 An Optimization-based Categorization of Reinforcement Learning Environments
 Michael L. Littman, 1993
    Details   
 Overcoming Incomplete Perception with Utile Distinction Memory
 R. Andrew McCallum, 1993
    Details   
 Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
 Andrew W. Moore and  Christopher G. Atkeson, 1993
    Details   
 Efficient learning and planning within the Dyna framework
 Jing Peng and  Ronald J. Williams, 1993
    Details   
 Approximating Q-Values with Basis Function Representations
 Philip Sabes, 1993
    Details   
 Online Learning with Random Representations
 Richard S. Sutton and  Steven D. Whitehead, 1993
    Details   
 Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents
 Ming Tan, 1993
    Details   
 Issues in Using Function Approximation for Reinforcement Learning
 Sebastian Thrun and  Anton Schwartz, 1993
    Details   
 Interactions between Learning and Evolution
 David Ackley and  Michael Littman, 1992
    Details   
 Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
 Lonnie Chrisman, 1992
    Details   
 Inductive Biases in a Reinforcement Learner
 Helen G. Cobb, 1992
    Details   
 The Convergence of TD($łambda$) for General $łambda$
 Peter Dayan, 1992
    Details   
 Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
 Long-Ji Lin, 1992
    Details   
 Practical Issues in Temporal Difference Learning
 Gerald Tesauro, 1992
    Details   
 Q-Learning
 Christopher J. C. H. Watkins and  Peter Dayan, 1992
    Details   
 Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
 Ronald J. Williams, 1992
    Details   
 Viability Theory
 Jean-Pierre Aubin, 1991
    Details   
 Intelligence without Representation
 Rodney A. Brooks, 1991
    Details   
 Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control
 Ming Tan, 1991
    Details   
 Predicting Bank Failures in the 1980s
 James B. Thomson, 1991
    Details   
 A Proportional Hazards Model of Bank Failure: An Examination of its Usefulness as an Early Warning Tool
 Gary Whalen, 1991
    Details   
 Learning to perceive and act by trial and error
 Steven D. Whitehead and  Dana H. Ballard, 1991
    Details   
 Learning Sequential Decision Rules Using Simulation Models and Competition
 John J. Grefenstette,  Connie Loggia Ramsey, and  Alan C. Schultz, 1990
    Details   
 Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
 Richard S. Sutton, 1990
    Details   
 Multilayer feedforward networks are universal approximators
 Kurt Hornik,  Maxwell B. Stinchcombe, and  Halbert White, 1989
    Details   
 Restricted Subset Selection Procedures for Simulation
 David W. Sullivan and  James R. Wilson, 1989
    Details   
 An algorithm for automated tsunami warning in French Polynesia based on mantle magnitudes 
 Jacques Talandier and  Emile A. Okal, 1989
    Details   
 Learning from Delayed Rewards
 Christopher John Cornish Hellaby Watkins, 1989
    Details   
 How Evaluation Guides AI Research: The Message Still Counts More than the Medium
 Paul R. Cohen and  Adele E. Howe, 1988
    Details   
 Genetic algorithms in noisy environments
 J. Michael Fitzpatrick and  John J. Grefenstette, 1988
    Details   
 Survey of model-based failure detection and isolation in complex plants
 J. J. Gertler, 1988
    Details   
 Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework
 David Haussler, 1988
    Details   
 Machine Learning as an Experimental Science
 Pat Langley, 1988
    Details   
 Learning to Predict By the Methods of Temporal Differences
 Richard S. Sutton, 1988
    Details   
 Further Real Applications of Markov Decision Processes
 Douglas J. White, 1988
    Details   
 Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm
 Nick Littlestone, 1987
    Details   
 On Optimal Cooperation of Knowledge Sources - An Empirical Investigation
 M. Benda,  V. Jagannathan, and  R. Dodhiawala, 1986
    Details   
 Shift of Bias for Inductive Concept Learning
 Paul E. Utgoff, 1986
    Details   
 Bandit problems
 Donald A. Berry and  Bert Fristedt, 1985
    Details   
 A procedure for selecting a subset of size $m$ containing the $l$ best of $k$ independent normal populations, with applications to simulation
 Lloyd W. Koenig and  Averill M. Law, 1985
    Details   
 Asymptotically Efficient Adaptive Allocation Rules
 T. L. Lai and  Herbert Robbins, 1985
    Details   
 Real Applications of Markov Decision Processes
 Douglas J. White, 1985
    Details   
 Neuronlike adaptive elements that can solve difficult learning control problems
 Andrew G. Barto,  Richard S. Sutton, and  Charles W. Anderson, 1983
    Details   
 A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
 George E. Monahan, 1982
    Details   
 Brains, Behavior and Robotics
 James Sacra Albus, 1981
    Details   
 The Need for Biases in Learning Generalizations
 Tom M. Mitchell, 1980
    Details   
 Early Warning Indicators of Business Failure
 Subhash Sharma and  Vijay Mahajan, 1980
    Details   
 The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs
 Edward J. Sondik, 1978
    Details   
 Determining Sample Size for Pretesting Comparative Effectiveness of Advertising Copies
 Siddhartha R. Dalal and  V. Srinivasan, 1977
    Details   
 Sequential models for clinical trials
 Herman Chernoff, 1967
    Details   
 Optimal Control of Markov Processes with Incomplete State Information
 K. J. Åström, 1965
    Details   
 A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations
 Edward Paulson, 1964
    Details   
 Probability Inequalities for Sums of Bounded Random Variables
 Wassily Hoeffding, 1963
    Details   
 The Future of Data Analysis
 John W. Tukey, 1962
    Details   
 Comparing entries in random sample tests
 W. A. Becker, 1961
    Details   
 A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs
 Robert E. Bechhofer, 1958
    Details   
 Dynamic Programming
 Richard Bellman, 1957
    Details   
 Some aspects of the sequential design of experiments
 Herbert Robbins, 1952
    Details   
 Sequential Analysis
 Abraham Wald, 1947
    Details   
 Contributions to the Theory of Sequential Analysis. I
 M. A. Girshick, 1946
    Details   
 Contributions to the Theory of Sequential Analysis, II, III
 M. A. Girshick, 1946
    Details