Shivaram's Reading List


Function Approximation     Partial Observability     Learning Methods     Ensembles    
Stochastic Optimisation     General RL     General ML     Multiagent Learning    
Comparison/Integration     Bandits     Applications     Robot Soccer    
Humanoids     Parameter     MDP     Empirical    
Failure Warning     Representation     General AI     Neural Networks    
All    

All

Almost Optimal Exploration in Multi-Armed Bandits
Zohar Karnin, Tomer Koren, and Oren Somekh, 2013
Details   

Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013
Details   

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric, 2012
Details   

Planning in Reward-Rich Domains via PAC Bandits
Sergiu Goschin, Ari Weinstein, Michael L. Littman, and Erick Chastain, 2012
Details   

Learning Methods for Sequential Decision Making with Imperfect Representations
Shivaram Kalyanakrishnan, 2011
Details   

Learning to Predict Humanoid Fall
Shivaram Kalyanakrishnan and Ambarish Goswami, 2011
Details   

Characterizing reinforcement learning methods through parameterized learning problems
Shivaram Kalyanakrishnan and Peter Stone, 2011
Details   

On Learning with Imperfect Representations
Shivaram Kalyanakrishnan and Peter Stone, 2011
Details   

On Optimizing Interdependent Skills: A Case Study in Simulated 3D Humanoid Robot Soccer
Daniel Urieli, Patrick MacAlpine, Shivaram Kalyanakrishnan, Yinon Bentor, and Peter Stone, 2011
Details   

Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning
Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone, 2011
Details   

Exploiting Best-Match Equations for Efficient Reinforcement Learning
Harm van Seijen, Shimon Whiteson, Hado van Hasselt, and Marco Wiering, 2011
Details   

Insights in Reinforcement Learning: formal analysis and empirical evaluation of temporal-difference learning algorithms
Hado Philip van Hasselt, 2011
Details   

Success, strategy and skill: an experimental study
Christopher Archibald, Alon Altman, and Yoav Shoham, 2010
Details   

Best Arm Identification in Multi-Armed Bandits
Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos, 2010
Details   

UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
Peter Auer and Ronald Ortner, 2010
Details   

Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda
Carlton Downey and Scott Sanner, 2010
Details   

A Brief Survey of Parametric Value Function Approximation
Matthieu Geist and Olivier Pietquin, 2010
Details   

Simulation optimization using the cross-entropy method with optimal computing budget allocation
Donghai He, Loo Hay Lee, Chun-Hung Chen, Michael C. Fu, and Segev Wasserkrug, 2010
Details   

An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Junya Honda and Akimichi Takemura, 2010
Details   

Near-optimal Regret Bounds for Reinforcement Learning
Thomas Jaksch, Ronald Ortner, and Peter Auer, 2010
Details   

Non-Stochastic Bandit Slate Problems
Satyen Kale, Lev Reyzin, and Robert E. Schapire, 2010
Details   

Predicting Falls of a Humanoid Robot through Machine Learning
Shivaram Kalyanakrishnan and Ambarish Goswami, 2010
Details   

Three Humanoid Soccer Platforms: Comparison and Synthesis
Shivaram Kalyanakrishnan, Todd Hester, Michael Quinlan, Yinon Bentor, and Peter Stone, 2010
Details   

Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010
Details   

Learning Complementary Multiagent Behaviors: A Case Study
Shivaram Kalyanakrishnan and Peter Stone, 2010
Details   

Fall Detection of Two-legged Walking Robots using Multi-way Principal Components Analysis
J. G. Daniël Karssen and Martijn Wisse, 2010
Details   

Regret bounds for sleeping experts and bandits
Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma, 2010
Details   

Finite-Sample Analysis of LSTD
Alessandro Lazaric, Mohammad Ghavamzadeh, and Rémi Munos, 2010
Details   

A contextual-bandit approach to personalized news article recommendation
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire, 2010
Details   

Estimating Learning Rates in Evolution and TDL: Results on a Simple Grid-World Problem
Simon M. Lucas, 2010
Details   

Toward Off-Policy Learning Control with Function Approximation
Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, and Richard S. Sutton, 2010
Details   

Biped Walk Learning Through Playback and Corrective Demonstration
\cCetin Meri\ccli and Manuela Veloso, 2010
Details   

Generalized Direction Changing Fall Control of Humanoid Robots Among Multiple Objects
Umashankar Nagarajan and Ambarish Goswami, 2010
Details   

Relative Entropy Policy Search
Jan Peters, Katharina Mülling, and Yasemin Altün, 2010
Details   

Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes
Marek Petrik, Gavin Taylor, Ron Parr, and Shlomo Zilberstein, 2010
Details   

Biped Walking using Coronal and Sagittal Movements based on Truncated Fourier Series
Nima Shafii, Luis Paulo Reis, and Nuno Lao, 2010
Details   

Application of Machine Learning To Epileptic Seizure Detection
Ali Shoeb and John Guttag, 2010
Details   

Algorithms for Reinforcement Learning
Csaba Szepesvári, 2010
Details   

SZ-Tetris as a Benchmark for Studying Key Problems of Reinforcement Learning
István Szita and Csaba Szepesvári, 2010
Details   

Model-based reinforcement learning with nearly tight exploration complexity bounds
István Szita and Csaba Szepesvári, 2010
Details   

Reinforcement learning of motor skills in high dimensions: A path integral approach
Evangelos Theodorou, Jonas Buchli, and Stefan Schaal, 2010
Details   

Improvements on Learning Tetris with Cross-Entropy
Christophe Thierry and Bruno Scherrer, 2010
Details   

Building Controllers for Tetris
Christophe Thierry and Bruno Scherrer, 2010
Details   

$epsilon$-First Policies for Budget-Limited Multi-Armed Bandits
Long Tran-Thanh, Archie Chapman, Enrique Munoz de Cote, Alex Rogers, and Nicholas R. Jennings, 2010
Details   

Machine Learning Approach to the Detection of Fetal Hypoxia during Labor and Delivery
Philip A. Warrick, Emily F. Hamilton, Robert E. Kearney, and Doina Precup, 2010
Details   

Critical Factors in the Empirical Performance of Temporal Difference and Evolutionary Methods for Reinforcement Learning
Shimon Whiteson, Matthew E. Taylor, and Peter Stone, 2010
Details   

Fall Detection and Management in Biped Humanoid Robots
Javier Ruiz-del-Solar, Javier Moya, and Isao Parra-Tsunekawa, 2010
Details   

Modeling billiards games
Christopher Archibald and Yoav Shoham, 2009
Details   

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári, 2009
Details   

On the Evolution of Artificial Tetris Players
Amine Boumaza, 2009
Details   

Pure Exploration in Multi-armed Bandits Problems
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz, 2009
Details   

Combinatorial Bandits
Nicolò Cesa-Bianchi and Gábor Lugosi, 2009
Details   

The adaptive $k$-meteorologists problem and its application to structure learning and feature selection in reinforcement learning
Carlos Diuk, Lihong Li, and Bethany R. Leffler, 2009
Details   

Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem
Damien Ernst, Mevludin Glavic, Florin Capitanescu, and Louis Wehenkel, 2009
Details   

The Knowledge-Gradient Policy for Correlated Normal Beliefs
Peter Frazier, Warren Powell, and Savas Dayanik, 2009
Details   

A Case Study on Improving Defense Behavior in Soccer Simulation 2D: The NeuroHassle Approach
Thomas Gabel, Martin Riedmiller, and Florian Trost, 2009
Details   

Computational Sustainability: Computational Methods for a Sustainable Environment, Economy, and Society
Carla P. Gomes, 2009
Details   

Improving Optimistic Exploration in Model-Free Reinforcement Learning
Marek Grze\'s and Daniel Kudenko, 2009
Details   

The WEKA Data Mining Software: An Update
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten, 2009
Details   

The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen, 2009
Details   

A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion
Nikolaus Hansen, André S.P. Niederberger, Lino Guzzella, and Petros Koumoutsakos, 2009
Details   

Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
Verena Heidrich-Meisner and Christian Igel, 2009
Details   

Neuroevolution strategies for episodic reinforcement learning
Verena Heidrich-Meisner and Christian Igel, 2009
Details   

Probabilistic Balance Monitoring for Bipedal Robots
O. Höhn and W. Gerth, 2009
Details   

SarsaLandmark: an algorithm for learning in POMDPs with landmarks
Michael R. James and Satinder Singh, 2009
Details   

Generalized AMOC Curves For Evaluation and Improvement of Event Surveillance
Xia Jiang, Gregory F. Cooper, and Daniel B. Neill, 2009
Details   

Feature Selection for Value Function Approximation Using Bayesian Model Selection
Tobias Jung and Peter Stone, 2009
Details   

An empirical analysis of value function-based and policy search reinforcement learning
Shivaram Kalyanakrishnan and Peter Stone, 2009
Details   

The UT Austin Villa 3D Simulation Soccer Team 2008
Shivaram Kalyanakrishnan, Yinon Bentor, and Peter Stone, 2009
Details   

Fall detection in walking robots by multi-way principal component analysis
J. G. Daniël Karssen and Martijn Wisse, 2009
Details   

Learning motor primitives for robotics
Jens Kober and Jan Peters, 2009
Details   

Evolving Neural Networks for Strategic Decision-Making Problems
Nate Kohl and Risto Miikkulainen, 2009
Details   

Regularization and feature selection in least-squares temporal difference learning
J. Zico Kolter and Andrew Y. Ng, 2009
Details   

Automatic Parameter Optimization for a Dynamic Robot Simulation
Tim Laue and Matthias Hebbel, 2009
Details   

Learning Representation and Control in Markov Decision Processes: New Frontiers
Sridhar Mahadevan, 2009
Details   

Nonparametric representation of an approximated Poincaré map for learning biped locomotion
Jun Morimoto and Christopher G. Atkeson, 2009
Details   

Reinforcement learning in the brain
Yael Niv, 2009
Details   

Biasing Approximate Dynamic Programming with a Lower Discount Factor
Marek Petrik and Bruno Scherrer, 2009
Details   

Feature Discovery in Approximate Dynamic Programming
Philippe Preux, Sertan Girgin, and Manuel Loth, 2009
Details   

Reinforcement learning for robot soccer
Martin Riedmiller, Thomas Gabel, Roland Hafner, and Sascha Lange, 2009
Details   

Evolving Multi-modal Behavior in NPCs
Jacob Schrum and Risto Miikkulainen, 2009
Details   

Reinforcement Learning in Finite MDPs: PAC Analysis
Lihong Strehl, Alexander L., Li and Michael L. Littman, 2009
Details   

Stochastic search using the natural gradient
Yi Sun, Daan Wierstra, Tom Schaul, and Jürgen Schmidhuber, 2009
Details   

Fast gradient-descent methods for temporal-difference learning with linear function approximation
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, and Eric Wiewiora, 2009
Details   

Efficient covariance matrix update for variable metric evolution strategies
Thorsten Suttorp, Nikolaus Hansen, and Christian Igel, 2009
Details   

Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
Michael T. Todd, Yael Niv, and Jonathan D. Cohen, 2009
Details   

Ontogenetic and Phylogenetic Reinforcement Learning
Julian Togelius, Tom Schaul, Daan Wierstra, Christian Igel, Faustino Gomez, and Jürgen Schmidhuber, 2009
Details   

Generalized Domains for Empirical Evaluations in Reinforcement Learning
Shimon Whiteson, Brian Tanner, Matthew E. Taylor, and Peter Stone, 2009
Details   

Designing falling motions for a humanoid soccer goalie
Tobias Wilken, Marcell Missura, and Sven Behnke, 2009
Details   

Safe Fall: Humanoid robot fall direction change through intelligent stepping and inertia shaping
Seung-kook Yun, Ambarish Goswami, and Yoshiaki Sakagami, 2009
Details   

CMDragons 2009 Extended Team Description
Stefan Zickler, James Bruce, Joydeep Biswas, Michael Licitra, and Manuela Veloso, 2009
Details   

A Theoretical and Empirical Analysis of Expected Sarsa
Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering, 2009
Details   

Learning to fall: Designing low damage fall sequences for humanoid soccer robots
J. Ruiz-del-Solar, R. Palma-Amestoy, R. Marchant, I. Parra-Tsunekawa, and P. Zegers, 2009
Details   

Incremental Natural Actor-Critic Algorithms
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, and Mark Lee, 2008
Details   

A Comprehensive Survey of Multiagent Reinforcement Learning
Lucian Bu\csoniu, Robert Babu\vska, and Bart De Schutter, 2008
Details   

An empirical evaluation of supervised learning in high dimensions
Rich Caruana, Nikolaos Karampatziakis, and Ainur Yessenalina, 2008
Details   

Efficient Simulation Budget Allocation for Selecting an Optimal Subset
Chun-Hung Chen, Donghai He, Michael Fu, and Loo Hay Lee, 2008
Details   

The Role of Value Systems in Decision Making
Peter Dayan, 2008
Details   

Decision Theory, Reinforcement Learning, and the Brain
Peter Dayan and Nathaniel D. Daw, 2008
Details   

Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot
Gen Endo, Jun Morimoto, Takamitsu Matsubara, Jun Nakanishi, and Gordon Cheng, 2008
Details   

Simulation-Based Approach to General Game Playing
Hilmar Finnsson and Yngvi Björnsson, 2008
Details   

Feature Discovery in Reinforcement Learning Using Genetic Programming
Sertan Girgin and Philippe Preux, 2008
Details   

Accelerated Neural Evolution through Cooperatively Coevolved Synapses
Faustino Gomez, Jürgen Schmidhuber, and Risto Miikkulainen, 2008
Details   

Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau, 2008
Details   

Similarities and differences between policy gradient methods and evolution strategies
Verena Heidrich-Meisner and Christian Igel, 2008
Details   

Evolution Strategies for Direct Policy Search
Verena Heidrich-Meisner and Christian Igel, 2008
Details   

Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem
Verena Heidrich-Meisner and Christian Igel, 2008
Details   

Temporal Difference Updating without a Learning Rate
Marcus Hutter and Shane Legg, 2008
Details   

A new perspective to the keepaway soccer: the takers
Atil Iscen and Umut Erogul, 2008
Details   

Model-Based Reinforcement Learning in a Complex Domain
Shivaram Kalyanakrishnan, Peter Stone, and Yaxin Liu, 2008
Details   

Cross-Entropy Method for Reinforcement Learning
Steijn Kistemaker, 2008
Details   

Multi-armed bandits in metric spaces
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal, 2008
Details   

Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
William B. Langdon, Riccardo Poli, Nicholas Freitag McPhee, and John R. Koza, 2008
Details   

A worst-case comparison between temporal difference and residual gradient with linear function approximation
Lihong Li, 2008
Details   

An analysis of reinforcement learning with function approximation
Francisco S. Melo, Sean P. Meyn, and M. Isabel Ribeiro, 2008
Details   

Analysis of an Evolutionary Reinforcement Learning Method in a Multiagent Domain
Jan Hendrik Metzen, Mark Edgington, Yohannes Kassahun, and Frank Kirchner, 2008
Details   

Empirical Bernstein stopping
Volodymyr Mnih, Csaba Szepesvári, and Jean-Yves Audibert, 2008
Details   

Real-time selection and generation of fall damage reduction actions for humanoid robots
Kunihiro Ogata, Koji Terada, and Yasuo Kuniyoshi, 2008
Details   

Advanced Data Mining Techniques
David L. Olson and Dursun Delen, 2008
Details   

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, and Michael L. Littman, 2008
Details   

Reinforcement learning of motor skills with policy gradients
Jan Peters and Stefan Schaal, 2008
Details   

Natural Actor-Critic
Jan Peters and Stefan Schaal, 2008
Details   

Sample-based Learning and Search with Permanent and Transient Memories
David Silver, Richard S. Sutton, and Martin Müller, 2008
Details   

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, and Michael Bowling, 2008
Details   

The many faces of optimism: a unifying approach
Istvan Szita and András Lörincz, 2008
Details   

Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Gerald Tesauro, Rajarshi Das, Hoi Chan, Jeffrey O. Kephart, Charles Lefurgy, David W. Levine, and Freeman Rawson, 2008
Details   

Viability and predictive control for safe locomotion
Pierre-Brice Wieber, 2008
Details   

Ensemble Algorithms in Reinforcement Learning
Marco Wiering and Hado van Hasselt, 2008
Details   

SATzilla: Portfolio-based Algorithm Selection for SAT
Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown, 2008
Details   

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
Engin \.Ipek, Onur Mutlu, José and Martínez, and Rich Caruana, 2008
Details   

Tuning Bandit Algorithms in Stochastic Environments
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári, 2007
Details   

Sample Complexity of Policy Search with Known Dynamics
Peter L. Bartlett and Ambuj Tewari, 2007
Details   

Distinguishing falls from normal ADL using vertical velocity profiles
Alan K. Bourke, Karol J. O'Donovan, and Gearóid M. ÓLaighin, 2007
Details   

An optimal planning of falling motions of a humanoid robot
Kiyoshi Fujiwara, Shuuji Kajita, Kensuke Harada, Kenji Kaneko, Mitsuharu Morisawa, Fumio Kanehiro, Shinichiro Nakaoka, and Hirohisa Hirukawa, 2007
Details   

Bayesian actor-critic algorithms
Mohammad Ghavamzadeh and Yaakov Engel, 2007
Details   

Bayesian Policy Gradient Algorithms
Mohammad Ghavamzadeh and Yaakov Engel, 2007
Details   

Human-Robot Interaction: A Survey
Michael A. Goodrich and Alan C. Schultz, 2007
Details   

Approximation Algorithms for Budgeted Learning Problems
Sudipto Guha and Kamesh Munagala, 2007
Details   

Learning RoboCup-Keepaway with Kernels
Tobias Jung and Daniel Polani, 2007
Details   

Batch Reinforcement Learning in a Complex Domain
Shivaram Kalyanakrishnan and Peter Stone, 2007
Details   

Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study
Shivaram Kalyanakrishnan, Yaxin Liu, and Peter Stone, 2007
Details   

The UT Austin Villa 3D Simulation Soccer Team 2007
Shivaram Kalyanakrishnan and Peter Stone, 2007
Details   

Recent advances in ranking and selection
Seong-Hee Kim and Barry L. Nelson, 2007
Details   

Large Scale Reinforcement Learning using Q-Sarsa($łambda$) and Cascading Neural Networks
Steffen Nissen, 2007
Details   

Fall detection - Principles and Methods
N. Noury, A. Fleury, P. Rumeau, A. K. Bourke, G. ÓLaighin, V. Rialle, and J.E. Lundy, 2007
Details   

Falling Motion Control for Humanoid Robots While Walking
Kunihiro Ogata, Koji Terada, and Yasuo Kuniyoshi, 2007
Details   

Efficient Failure Detection on Mobile Robots Using Particle Filters with Gaussian Process Proposals
Christian Plagemann, Dieter Fox, and Wolfram Burgard, 2007
Details   

On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup
Martin Riedmiller and Thomas Gabel, 2007
Details   

Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark
Martin Riedmiller, Jan Peters, and Stefan Schaal, 2007
Details   

Autonomous blimp control using model-free reinforcement learning in a continuous state and action space
Axel Rottmann, Christian Plagemann, Peter Hilgers, and Wolfram Burgard, 2007
Details   

Learning classifier systems: a survey
Olivier Sigaud and Stewart W. Wilson, 2007
Details   

Reinforcement Learning of Local Shape in the Game of Go
David Silver, Richard S. Sutton, and Martin Müller, 2007
Details   

On the role of tracking in stationary environments
Richard S. Sutton, Anna Koop, and David Silver, 2007
Details   

Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man
István Szita and András L\Horincz, 2007
Details   

Representation Transfer for Reinforcement Learning
Matthew E. Taylor and Peter Stone, 2007
Details   

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
Matthew E. Taylor, Peter Stone, and Yaxin Liu, 2007
Details   

On the use of hybrid reinforcement learning for autonomic resource allocation
Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, and Mohamed N. Bennani, 2007
Details   

Adaptive Representations for Reinforcement Learning
Shimon Azariah Whiteson, 2007
Details   

Piecewise-Linear Pattern Generator and Reflex System for Humanoid Robots
Riadh Zaier and Shinji Kanda, 2007
Details   

See, walk, and kick: Humanoid robots start to play soccer
Sven Behnke, Michael Schreiber, Jörg Stückler, Reimund Renner, and Hauke Strasdat, 2006
Details   

Pattern Recognition and Machine Learning
Christopher M. Bishop, 2006
Details   

An empirical comparison of supervised learning algorithms
Rich Caruana and Alexandru Niculescu-Mizil, 2006
Details   

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems
Thomas Degris, Olivier Sigaud, and Pierre-Henri Wuillemin, 2006
Details   

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
Eyal Even-Dar, Shie Mannor, and Yishay Mansour, 2006
Details   

Tetris: A Study of Randomized Constraint Sampling
Vivek F. Farias and Benjamin Van Roy, 2006
Details   

Towards an Optimal Falling Motion for a Humanoid Robot
Kiyoshi Fujiwara, Shuuji Kajita, Kensuke Harada, Kenji Kaneko, Mitsuharu Morisawa, Fumio Kanehiro, Shinichiro Nakaoka, and Hirohisa Hirukawa, 2006
Details   

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
Abraham P. George and Warren B. Powell, 2006
Details   

Hierarchical multi-agent reinforcement learning
Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar, 2006
Details   

Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot
Kentarou Hitomi, Tomohiro Shibata, Yutaka Nakamura, and Shin Ishii, 2006
Details   

An Overview of Cooperative and Competitive Multiagent Learning
Pieter Jan't Hoen, Karl Tuyls, Liviu Panait, Sean Luke, and Johannes A. La Poutré, 2006
Details   

Looping suffix tree-based inference of partially observable hidden state
Michael P. Holmes and Charles Lee Isbell, Jr, 2006
Details   

Bandit Based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
Details   

Evolving a Real-World Vehicle Warning System
Nate Kohl, Kenneth Stanley, Risto Miikkulainen, Michael Samples, and Rini Sherony, 2006
Details   

Stepping Motion for a Human-like Character to Maintain Balance against Large Perturbations
Shunsuke Kudoh, Taku Komura, and Katsushi Ikeuchi, 2006
Details   

Quadruped Robot Obstacle Negotiation via Reinforcement Learning
Honglak Lee, Yirong Shen, Chih-Han Yu, Gurjeet Singh, and Andrew Y. Ng, 2006
Details   

Relaxed fault detection and isolation: An application to a nonlinear case study
Raffaella Mattone and Alessandro De Luca, 2006
Details   

Reinforcement learning for optimized trade execution
Yuriy Nevmyvaka, Yi Feng, and Michael Kearns, 2006
Details   

Balance Control of a Humanoid Robot Based on the Reaction Null Space Method
Akinori Nishio, Kentaro Takahashi, and Dragomir N. Nenchev, 2006
Details   

Anytime Point-Based Approximations for Large POMDPs
Joelle Pineau, Geoffrey J. Gordon, and Sebastian Thrun, 2006
Details   

Capture Point: A Step toward Humanoid Push Recovery
Jerry Pratt, John Carff, Sergey Drakunov, and Ambarish Goswami, 2006
Details   

Instability Detection and Fall Avoidance for a Humanoid using Attitude Sensors and Reflexes
Reimund Renner and Sven Behnke, 2006
Details   

Integrating Techniques from Statistical Ranking into Evolutionary Algorithms
Christian Schmidt, Jürgen Branke, and Stephen E. Chick, 2006
Details   

Keepaway Soccer: From Machine Learning Testbed to Benchmark
Peter Stone, Gregory Kuhlmann, Matthew E. Taylor, and Yaxin Liu, 2006
Details   

PAC model-free reinforcement learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, 2006
Details   

Getting Back on Two Feet: Reliable Standing-up Routines for a Humanoid Robot
Jörg Stückler, Johannes Schwenk, and Sven Behnke, 2006
Details   

Learning Tetris using the noisy cross-entropy method
István Szita and András L\Horincz, 2006
Details   

Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
Details   

On-line evolutionary computation for reinforcement learning in stochastic domains
Shimon Whiteson and Peter Stone, 2006
Details   

An Evolutionary Approach to Tetris
Niko Böhm, Gabriella Kókai, and Stefan Mandl, 2005
Details   

An Adaptive Sampling Algorithm for Solving Markov Decision Processes
Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu, and Steven I. Marcus, 2005
Details   

Tree-Based Batch Mode Reinforcement Learning
Damien Ernst, Pierre Geurts, and Louis Wehenkel, 2005
Details   

Sensory reflex control for humanoid walking
Qiang Huang and Yoshihiko Nakamura, 2005
Details   

Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
Terran Lane and William D. Smart, 2005
Details   

Evolving a Neural Network Location Evaluator to Play Ms. Pac-Man
Simon M. Lucas, 2005
Details   

Basis Function Adaptation in Temporal Difference Reinforcement Learning
Ishai Menache, Shie Mannor, and Nahum Shimkin, 2005
Details   

Spark - A generic simulator for physical multi-agent simulations
Oliver Obst and Markus Rollman, 2005
Details   

Cooperative Multi-Agent Learning: The State of the Art
Liviu Panait and Sean Luke, 2005
Details   

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
Martin Riedmiller, 2005
Details   

Function Approximation via Tile Coding: Automating Parameter Choice
Alexander A. Sherstov and Peter Stone, 2005
Details   

Reinforcement Learning for RoboCup-Soccer Keepaway
Peter Stone, Richard S. Sutton, and Gregory Kuhlmann, 2005
Details   

A theoretical analysis of Model-Based Interval Estimation
Alexander L. Strehl and Michael L. Littman, 2005
Details   

Zero-Moment Point - Thirty Five Years of its Life
Miomir Vukobratović and Branislav Borovac, 2005
Details   

Evolving Soccer Keepaway Players Through Task Decomposition
Shimon Whiteson, Nate Kohl, Risto Miikkulainen, and Peter Stone, 2005
Details   

Data Mining: Practical machine learning tools and techniques
Ian H. Witten and Eibe Frank, 2005
Details   

A Tutorial on the Cross-Entropy Method
Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein, 2005
Details   

Sequential Sampling in Noisy Environments
Jürgen Branke and Christian Schmidt, 2004
Details   

Tetris is hard, even to approximate
Ron Breukelaar, Erik D. Demaine, Susan Hohenberger, Hendrik Jan Hoogeboom, Walter A. Kosters, and David Liben-Nowell, 2004
Details   

Failure diagnosis using decision trees
Mike Chen, Alice X. Zheng, Jim Lloyd, Michael I. Jordan, and Eric Brewer, 2004
Details   

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis
Claudia V. Goldman and Shlomo Zilberstein, 2004
Details   

Machine Learning for Fast Quadrupedal Locomotion
Nate Kohl and Peter Stone, 2004
Details   

Sparse cooperative Q-learning
Jelle R. Kok and Nikos Vlassis, 2004
Details   

Reinforcement learning for sensing strategies
Cody Kwok and Dieter Fox, 2004
Details   

Distinctive Image Features from Scale-Invariant Keypoints
David G. Lowe, 2004
Details   

Active Model Selection
Omid Madani, Daniel J. Lizotte, and Russell Greiner, 2004
Details   

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
Shie Mannor and John N. Tsitsiklis, 2004
Details   

Convergence of synchronous reinforcement learning with linear function approximation
Artur Merke and Ralf Schoknecht, 2004
Details   

Webots$^TM$: Professional Mobile Robot Simulation
Olivier Michel, 2004
Details   

Autonomous Helicopter Flight via Reinforcement Learning
Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, and Shankar Sastry, 2004
Details   

On the Numeric Stability of Gaussian Processes Regression for Relational Reinforcement Learning
Jan Ramon and Kurt Driessens, 2004
Details   

Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning
Bohdana Ratitch and Doina Precup, 2004
Details   

Multi-Agent Patrolling with Reinforcement Learning
Hugo Santana, Geber Ramalho, Vincent Corruble, and Bohdana Ratitch, 2004
Details   

Temporal difference models describe higher-order learning in humans
Ben Seymour, John P. O'Doherty, Peter Dayan, Martin Koltzenburg, Anthony K. Jones, Raymond J. Dolan, Karl J. Friston, and Richard S. Frackowiak, 2004
Details   

Efficient Evolution of Neural Networks Through Complexification
Kenneth Owen Stanley, 2004
Details   

Stochastic policy gradient reinforcement learning on a simple 3D biped
Russ Tedrake, Teresa Weirui Zhang, and H. Sebastian Seung, 2004
Details   

GenSo-EWS: a novel neural-fuzzy based early warning system for predicting bank failures
W. L. Tung, C. Quek, and P. Cheng, 2004
Details   

Adaptive Job Routing and Scheduling
Shimon Whiteson and Peter Stone, 2004
Details   

A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations
Bram Bakker, Viktor Zhumatiy, Gabriel Gruener, and Jürgen Schmidhuber, 2003
Details   

Using Ranking and Selection to “Clean Up“ after Simulation Optimization
Justin Boesel, Barry L. Nelson, and Seong-Hee Kim, 2003
Details   

R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
Ronen I. Brafman and Moshe Tennenholtz, 2003
Details   

Users Manual: RoboCup Soccer Server --- for Soccer Server Version 7.07 and Later
Mao Chen, Klaus Dorer, Ehsan Foroughi, Fredrick Heintz, ZhanXiang Huang, Spiros Kapetanakis, Kostas Kostiadis, Johan Kummeneje, Jan Murray, Itsuki Noda, Oliver Obst, Pat Riley, Timo Steffens, Yi Wang, and Xiang Yin, 2003
Details   

SPEEDY: A Fall Detector in a Wrist Watch
Thomas Degen, Heinz Jaeckel, Michael Rufer, and Stefan Wyss, 2003
Details   

Learning to play Pac-Man: An Evolutionary, Rule-based Approach
Marcus Gallagher and Amanda Ryan, 2003
Details   

Active Guidance for a Finless Rocket Using Neuroevolution
Faustino J. Gomez and Risto Miikkulainen, 2003
Details   

Biped walking pattern generation by a simple three-dimensional inverted pendulum model
Shuuji Kajita, Fumio Kanehiro, Kenji Kaneko, Kiyoshi Fujiwara, Kazuhito Yokoi, and Hirohisa Hirukawa, 2003
Details   

Survey of Intelligent Control Techniques for Humanoid Robots
Du\vsko Katić and Miomir Vukobratović, 2003
Details   

On Actor-Critic Algorithms
Vijay R. Konda and John N. Tsitsiklis, 2003
Details   

Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr, 2003
Details   

Reinforcement Learning as Classification: Leveraging Modern Classifiers
Michail G. Lagoudakis and Ronald Parr, 2003
Details   

Boosting as a Metaphor for Algorithm Design
Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFadden, and Yoav Shoham, 2003
Details   

Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem
Shie Mannor and John N. Tsitsiklis, 2003
Details   

Least Squares Policy Evaluation Algorithms with Linear Function Approximation
A. Nedić and D. P. Bertsekas, 2003
Details   

A Convergent Form of Approximate Policy Iteration
Theodore J. Perkins and Doina Precup, 2003
Details   

Using MDP Characteristics to Guide Exploration in Reinforcement Learning
Bohdana Ratitch and Doina Precup, 2003
Details   

Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
Ralf Schoknecht, 2003
Details   

An Agent that Learns to Play Pacman
Donald Shepherd, 2003
Details   

Introduction to Stochastic Search and Optimization
James C. Spall, 2003
Details   

Monitoring and early warning for Internet worms
Cliff Changchun Zou, Lixin Gao, Weibo Gong, and Don Towsley, 2003
Details   

Scaling Internal-State Policy-Gradient Methods for POMDPs
Douglas Aberdeen and Jonathan Baxter, 2002
Details   

Finite-time Analysis of the Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer, 2002
Details   

The Nonstochastic Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire, 2002
Details   

Threshold selection, hypothesis tests, and DOE methods
Thomas Beielstein and Sandor Markon, 2002
Details   

The Complexity of Decentralized Control of Markov Decision Processes
Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein, 2002
Details   

An $epsilon$-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes
Blai Bonet, 2002
Details   

Technical Update: Least-Squares Temporal Difference Learning
Justin A. Boyan, 2002
Details   

Deep Blue
Murray Campbell, A. Joseph Hoane Jr., and Feng-hsiung Hsu, 2002
Details   

PAC Bounds for Multi-armed Bandit and Markov Decision Processes
Eyal Even-Dar, Shie Mannor, and Yishay Mansour, 2002
Details   

Optimization for simulation: Theory vs. Practice
Michael C. Fu, 2002
Details   

UKEMI: Falling motion control to minimize damage to biped humanoid robot
Kiyoshi Fujiwara, Fumio Kanehiro, Shuji Kajita, Kenji Kaneko, Kazuhito Yokoi, and Hirohisa Hirukawa, 2002
Details   

Coordinated Reinforcement Learning
Carlos Guestrin, Michail G. Lagoudakis, and Ronald Parr, 2002
Details   

Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations
Shanti S. Gupta and S. Panchapakesan, 2002
Details   

Mining complex models from arbitrarily large databases in constant time
Geoff Hulten and Pedro Domingos, 2002
Details   

Discriminative, Generative and Imitative learning
Tony Jebara, 2002
Details   

Approximately Optimal Approximate Reinforcement Learning
Sham Kakade and John Langford, 2002
Details   

Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns and Satinder Singh, 2002
Details   

A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
Michael Kearns, Yishay Mansour, and Andrew Y. Ng, 2002
Details   

Least-Squares Methods in Reinforcement Learning for Control
Michail G. Lagoudakis, Ronald Parr, and Michael L. Littman, 2002
Details   

Variable Resolution Discretization in Optimal Control
Rémi Munos and Andrew Moore, 2002
Details   

Balance control analysis of humanoid robot based on ZMP feedback control
Napoleon, Shigeki Nakaura, and Mitsuji Sampei, 2002
Details   

Kernel-Based Reinforcement Learning
Dirk Ormoneit and Śaunak Sen, 2002
Details   

On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
Theodore J. Perkins and Mark D. Pendrith, 2002
Details   

Reinforcement Learning for POMDPs Based on Action Values and Stochastic Optimization
Theodore J. Perkins, 2002
Details   

Learning from Scarce Experience
Leonid Peshkin and Christian R. Shelton, 2002
Details   

Characterizing Markov Decision Processes
Bohdana Ratitch and Doina Precup, 2002
Details   

The intelligent ASIMO: system overview and integration
Yoshiaki Sakagami, Ryujin Watanabe, Chiaki Aoyama, Shinichi Matsunaga, Nobuo Higaki, and Kikuo Fujimura, 2002
Details   

A Perspective View and Survey of Meta-Learning
Ricardo Vilalta and Youssef Drissi, 2002
Details   

On the stability of walking systems
Pierre-Brice Wieber, 2002
Details   

Evolution strategies in noisy environments- a survey of existing work
D. V. Arnold, 2001
Details   

Scaling to Very Very Large Corpora for Natural Language Disambiguation
Michele Banko and Eric Brill, 2001
Details   

Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter and Peter L. Bartlett, 2001
Details   

Random Forests
Leo Breiman, 2001
Details   

Batch Value Function Approximation via Support Vectors
Thomas G. Dietterich and Xin Wang, 2001
Details   

Convergence of Optimistic and Incremental Q-Learning
Eyal Even-Dar and Yishay Mansour, 2001
Details   

Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State
Matthew R. Glickman and Katia Sycara, 2001
Details   

Algorithm portfolios
Carla P. Gomes and Bart Selman, 2001
Details   

Max-norm Projections for Factored MDPs
Carlos Guestrin, Daphne Koller, and Ronald Parr, 2001
Details   

Multiagent Planning with Factored MDPs
Carlos Guestrin, Daphne Koller, and Ronald Parr, 2001
Details   

AutoBalancer: An Online Dynamic Balance Compensation Scheme for Humanoid Robots
Satoshi Kagami, Fumio Kanehiro, Yukiharu Tamiya, Masayuki Inaba, and Hirochika Inoue, 2001
Details   

A Natural Policy Gradient
Sham Kakade, 2001
Details   

A fully sequential procedure for indifference-zone selection in simulation
Seong-Hee Kim and Barry L. Nelson, 2001
Details   

Thresholding - a selection operator for noisy ES
Sandor Markon, Dirk V. Arnold, Thomas Bäck, Thomas Beielstein, and Hans-Georg Beyer, 2001
Details   

Learning to trade via direct reinforcement
John Moody and Matthew Saffell, 2001
Details   

On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
Andrew Y. Ng and Michael I. Jordan, 2001
Details   

Off-Policy Temporal Difference Learning with Function Approximation
Doina Precup, Richard S. Sutton, and Sanjoy Dasgupta, 2001
Details   

On the Convergence of Temporal-Difference Learning with Linear Function Approximation
Vladislav Tadić, 2001
Details   

Reinforcement Learning in POMDP's via Direct Gradient Ascent
Jonathan Baxter and Peter L. Bartlett, 2000
Details   

Evolutionary algorithms in noisy environments: theoretical issues and guidelines for practice
Hans-Georg Beyer, 2000
Details   

Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
Thomas G. Dietterich, 2000
Details   

Mining high-speed data streams
Pedro Domingos and Geoff Hulten, 2000
Details   

Planning treatment of ischemic heart disease with partially observable Markov decision processes
Milos Hauskrecht and Hamish Fraser, 2000
Details   

Value-Function Approximations for Partially Observable Markov Decision Processes
Milos Hauskrecht, 2000
Details   

Local Search Algorithms for SAT: An Empirical Evaluation
Holger H. Hoos and Thomas Stützle, 2000
Details   

Policy Iteration for Factored MDPs
Daphne Koller and Ronald Parr, 2000
Details   

Policy Search via Density Estimation
Andrew Y. Ng, Ronald Parr, and Daphne Koller, 2000
Details   

PEGASUS: A policy search method for large MDPs and POMDPs
Andrew Y. Ng and Michael Jordan, 2000
Details   

Meta-Learning by Landmarking Various Learning Algorithms
Bernhard Pfahringer, Hilan Bensusan, and Christophe Giraud-Carrier, 2000
Details   

Exploiting Inherent Robustness and Natural Dynamics in the Control of Bipedal Walking Robots
Jerry E. Pratt, 2000
Details   

Eligibility Traces for Off-Policy Policy Evaluation
Doina Precup, Richard S. Sutton, and Satinder P. Singh, 2000
Details   

Optimization of Noisy Fitness Functions by Means of Genetic Algorithms Using History of Search
Yasuhito Sano and Hajime Kita, 2000
Details   

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Satinder Singh, Tommi Jaakkola, Michael L. Littman, and Csaba Szepesvári, 2000
Details   

Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour, 2000
Details   

Monte Carlo POMDPs
Sebastian Thrun, 2000
Details   

Gradient Descent for General Reinforcement Learning
Leemon Baird and Andrew Moore, 1999
Details   

An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
Eric Bauer and Ron Kohavi, 1999
Details   

Reinforcement Learning for Control of Self-Similar Call Traffic in Broadband Networks
Jakob Carlström and Ernst Nordström, 1999
Details   

Activity Monitoring: Noticing Interesting Changes in Behavior
Tom Fawcett and Foster Provost, 1999
Details   

Selecting and Ordering Populations: A New Statistical Methodology
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, 1999
Details   

Solving Non-Markovian Control Tasks with Neuro-Evolution
Faustino J. Gomez and Risto Miikkulainen, 1999
Details   

An empirical evaluation of several methods to select the best system
Koichiro Inoue, Stephen E. Chick, and Chun-Hung Chen, 1999
Details   

Evolutionary Algorithms for Reinforcement Learning
David E. Moriarty, Alan C. Schultz, and John J. Grefenstette, 1999
Details   

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
Andrew Y. Ng, Daishi Harada, and Stuart J. Russell, 1999
Details   

Convergence of Reinforcement Learning With General Function Approximators
Vassilis A. Papavassiliou and Stuart Russell, 1999
Details   

Reinforcement Learning Using Approximate Belief States
Andrés Rodríguez, Ronald Parr, and Daphne Koller, 1999
Details   

Distributed Value Functions
Jeff Schneider, Weng-Keen Wong, Andrew Moore, and Martin Riedmiller, 1999
Details   

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Richard S. Sutton, Doina Precup, and Satinder P. Singh, 1999
Details   

On-Line New Event Detection and Tracking
James Allan, Ron Papka, and Victor Lavrenko, 1998
Details   

Learning hierarchical control structures for multiple tasks and changing environments
Bruce L. Digney, 1998
Details   

Robot Shaping: An Experiment in Behavior Engineering
Marco Dorigo and Marco Colombetti, 1998
Details   

Neural Networks: A Comprehensive Foundation
Simon Haykin, 1998
Details   

Symposium on Applications of Reinforcement Learning: Final Report for NSF Grant IIS-9810208
Pat Langley and Mark Pendrith, 1998
Details   

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
John Loch and Satinder Singh, 1998
Details   

Q2: Memory-Based Active Learning for Optimizing Noisy Continuous Functions
Andrew W. Moore, Jeff G. Schneider, Justin A. Boyan, and Mary S. Lee, 1998
Details   

Hierarchical Control and Learning for Markov Decision Processes
Ronald Edward Parr, 1998
Details   

An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
Mark D. Pendrith and Michael J. McGarity, 1998
Details   

Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
Jette Randløv and Preben Alstrøm, 1998
Details   

Averaging Efficiently in the Presence of Noise
Peter Stagge, 1998
Details   

Layered Learning in Multi-Agent Systems
Peter Stone, 1998
Details   

Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto, 1998
Details   

Learning and Value Function Approximation in Complex Decision Processes
Benjamin Van Roy, 1998
Details   

A Comparison of Direct and Model-Based Reinforcement Learning
Christopher G. Atkeson and Juan Carlos Santamar\'ia, 1997
Details   

How to Lose at Tetris
Heidi Burgiel, 1997
Details   

Multitask Learning
Rich Caruana, 1997
Details   

Machine-Learning Research: Four Current Directions
Thomas G. Dietterich, 1997
Details   

The Racing Algorithm: Model Selection for Lazy Learners
Oded Maron and Andrew W. Moore, 1997
Details   

Reinforcement Learning in the Multi-Robot Domain
Maja J. Matarić, 1997
Details   

Alarm effectiveness in driver-centred collision-warning systems
R. Parasuraman, P. A. Hancock, and O. Olofinboba, 1997
Details   

Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
Satinder Singh and Dimitri Bertsekas, 1997
Details   

An analysis of temporal-difference learning with function approximation
John N. Tsitsiklis and Benjamin Van Roy, 1997
Details   

No free lunch theorems for optimization
David H. Wolpert and William G. Macready, 1997
Details   

Exponentially many local minima for single neurons
Peter Auer, Mark Herbster, and Manfred K. Warmuth, 1996
Details   

Neuro-Dynamic Programming
Dimitri P. Bertsekas and John N. Tsitsiklis, 1996
Details   

Linear Least-Squares Algorithms for Temporal Difference Learning
Steven J. Bradtke and Andrew G. Barto, 1996
Details   

Improving Elevator Performance Using Reinforcement Learning
Robert H. Crites and Andrew G. Barto, 1996
Details   

Experiments with a New Boosting Algorithm
Yoav Freund and Robert E. Schapire, 1996
Details   

Stable Fitted Reinforcement Learning
Geoffrey J. Gordon, 1996
Details   

Simulated Annealing for noisy cost functions
Walter J. Gutjahr and Georg Ch. Pflug, 1996
Details   

Reinforcement Learning with Selective Perception and Hidden State
Andrew Kachites McCallum, 1996
Details   

Genetic Algorithms, Selection Schemes, and the Varying Effects of Noise
Brad L. Miller and David E. Goldberg, 1996
Details   

Memory-based Stochastic Optimization
Andrew W. Moore and Jeff Schneider, 1996
Details   

Incremental Multi-Step Q-Learning
Jing Peng and Ronald J. Williams, 1996
Details   

Bagging, Boosting, and C4.5
J. Ross Quinlan, 1996
Details   

Evolution-Based Discovery of Hierarchical Behaviors
Justinian P. Rosca and Dana H. Ballard, 1996
Details   

Reinforcement learning with replacing eligibility traces
Satinder P. Singh and Richard S. Sutton, 1996
Details   

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
Richard S. Sutton, 1996
Details   

Feature-based methods for large scale dynamic programming
John N. Tsitsiklis and Benjamin Van Roy, 1996
Details   

Residual Algorithms: Reinforcement Learning with Function Approximation
Leemon Baird, 1995
Details   

Design and analysis of experiments for statistical selection, screening, and multiple comparisons
Robert E. Bechhofer, Thomas J. Santner, and David M. Goldsman, 1995
Details   

A Counterexample to Temporal Differences Learning
Dimitri P. Bertsekas, 1995
Details   

Generalization in Reinforcement Learning: Safely Approximating the Value Function
Justin A. Boyan and Andrew W. Moore, 1995
Details   

Recursive Automatic Bias Selection for Classifier Construction
Carla E. Brodley, 1995
Details   

Stable Function Approximation in Dynamic Programming
Geoffrey J. Gordon, 1995
Details   

Evaluation and Selection of Biases in Machine Learning
Diana F. Gordon and Marie desJardins, 1995
Details   

Strongly Typed Genetic Programming in Evolving Cooperation Strategies
Thomas Haynes, Roger L. Wainwright, Sandip Sen, and Dale A. Schoenefeld, 1995
Details   

Reinforcement Learning Algorithm for Partially Observable Markov Problems
Tommi Jaakkola, Satinder P. Singh, and Michael I. Jordan, 1995
Details   

Applications of machine learning and rule induction
Pat Langley and Herbert A. Simon, 1995
Details   

On the Complexity of Solving Markov Decision Problems
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling, 1995
Details   

Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State
R. Andrew McCallum, 1995
Details   

The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces
Andrew W. Moore and Christopher G. Atkeson, 1995
Details   

Approximating Optimal Policies for Partially Observable Stochastic Domains
Ronald Parr and Stuart Russell, 1995
Details   

Methods for Competitive Co-Evolution: Finding Opponents Worth Beating
Christopher D. Rosin and Richard K. Belew, 1995
Details   

Problem Solving with Reinforcement Learning
Gavin Adrian Rummery, 1995
Details   

Sequential PAC Learning
Dale Schuurmans and Russell Greiner, 1995
Details   

Artificial Intelligence: An Empirical Science
Herbert A. Simon, 1995
Details   

Reinforcement Learning with Soft State Aggregation
Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan, 1995
Details   

A Reinforcement Learning Approach to job-shop Scheduling
Wei Zhang and Thomas G. Dietterich, 1995
Details   

Acting optimally in partially observable stochastic domains
Anthony R. Cassandra, Leslie Pack Kaelbling, and Michael L. Littman, 1994
Details   

Using a Genetic Algorithm to Search for the Representational Bias of a Collective Reinforcement Learner
Helen G. Cobb and Peter Bock, 1994
Details   

TD($łambda$) Converges with Probability 1
Peter Dayan and Terrence J. Sejnowski, 1994
Details   

An Introduction to Computational Learning Theory
Michael J. Kearns and Umesh V. Vazirani, 1994
Details   

Markov Decision Processes
Martin L. Puterman, 1994
Details   

On-line Q-learning using connectionist systems
G. A. Rummery and M. Niranjan, 1994
Details   

Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Satinder P. Singh, Tommi Jaakkola, and Michael I. Jordan, 1994
Details   

An Upper Bound on the Loss from Approximate Optimal-Value Functions
Satinder P. Singh and Richard C. Yee, 1994
Details   

On bias and step size in temporal-difference learning
Richard S. Sutton and Satinder P. Singh, 1994
Details   

Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
Ronald J. Williams and Leemon C. Baird III, 1994
Details   

Reinforcement Learning Applied to Linear Quadratic Regulation
Steven J. Bradtke, 1993
Details   

Benchmarks, Test Beds, Controlled Experimentation, and the Design of Agent Architectures
Steve Hanks, Martha E. Pollack, and Paul R. Cohen, 1993
Details   

Reinforcement learning with hidden states
Long-Ji Lin and Tom M. Mitchell, 1993
Details   

An Optimization-based Categorization of Reinforcement Learning Environments
Michael L. Littman, 1993
Details   

Overcoming Incomplete Perception with Utile Distinction Memory
R. Andrew McCallum, 1993
Details   

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
Andrew W. Moore and Christopher G. Atkeson, 1993
Details   

Efficient learning and planning within the Dyna framework
Jing Peng and Ronald J. Williams, 1993
Details   

Approximating Q-Values with Basis Function Representations
Philip Sabes, 1993
Details   

Online Learning with Random Representations
Richard S. Sutton and Steven D. Whitehead, 1993
Details   

Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents
Ming Tan, 1993
Details   

Issues in Using Function Approximation for Reinforcement Learning
Sebastian Thrun and Anton Schwartz, 1993
Details   

Interactions between Learning and Evolution
David Ackley and Michael Littman, 1992
Details   

Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach
Lonnie Chrisman, 1992
Details   

Inductive Biases in a Reinforcement Learner
Helen G. Cobb, 1992
Details   

The Convergence of TD($łambda$) for General $łambda$
Peter Dayan, 1992
Details   

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
Long-Ji Lin, 1992
Details   

Practical Issues in Temporal Difference Learning
Gerald Tesauro, 1992
Details   

Q-Learning
Christopher J. C. H. Watkins and Peter Dayan, 1992
Details   

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
Ronald J. Williams, 1992
Details   

Viability Theory
Jean-Pierre Aubin, 1991
Details   

Intelligence without Representation
Rodney A. Brooks, 1991
Details   

Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control
Ming Tan, 1991
Details   

Predicting Bank Failures in the 1980s
James B. Thomson, 1991
Details   

A Proportional Hazards Model of Bank Failure: An Examination of its Usefulness as an Early Warning Tool
Gary Whalen, 1991
Details   

Learning to perceive and act by trial and error
Steven D. Whitehead and Dana H. Ballard, 1991
Details   

Learning Sequential Decision Rules Using Simulation Models and Competition
John J. Grefenstette, Connie Loggia Ramsey, and Alan C. Schultz, 1990
Details   

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
Richard S. Sutton, 1990
Details   

Multilayer feedforward networks are universal approximators
Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White, 1989
Details   

Restricted Subset Selection Procedures for Simulation
David W. Sullivan and James R. Wilson, 1989
Details   

An algorithm for automated tsunami warning in French Polynesia based on mantle magnitudes
Jacques Talandier and Emile A. Okal, 1989
Details   

Learning from Delayed Rewards
Christopher John Cornish Hellaby Watkins, 1989
Details   

How Evaluation Guides AI Research: The Message Still Counts More than the Medium
Paul R. Cohen and Adele E. Howe, 1988
Details   

Genetic algorithms in noisy environments
J. Michael Fitzpatrick and John J. Grefenstette, 1988
Details   

Survey of model-based failure detection and isolation in complex plants
J. J. Gertler, 1988
Details   

Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework
David Haussler, 1988
Details   

Machine Learning as an Experimental Science
Pat Langley, 1988
Details   

Learning to Predict By the Methods of Temporal Differences
Richard S. Sutton, 1988
Details   

Further Real Applications of Markov Decision Processes
Douglas J. White, 1988
Details   

Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm
Nick Littlestone, 1987
Details   

On Optimal Cooperation of Knowledge Sources - An Empirical Investigation
M. Benda, V. Jagannathan, and R. Dodhiawala, 1986
Details   

Shift of Bias for Inductive Concept Learning
Paul E. Utgoff, 1986
Details   

Bandit problems
Donald A. Berry and Bert Fristedt, 1985
Details   

A procedure for selecting a subset of size $m$ containing the $l$ best of $k$ independent normal populations, with applications to simulation
Lloyd W. Koenig and Averill M. Law, 1985
Details   

Asymptotically Efficient Adaptive Allocation Rules
T. L. Lai and Herbert Robbins, 1985
Details   

Real Applications of Markov Decision Processes
Douglas J. White, 1985
Details   

Neuronlike adaptive elements that can solve difficult learning control problems
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson, 1983
Details   

A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms
George E. Monahan, 1982
Details   

Brains, Behavior and Robotics
James Sacra Albus, 1981
Details   

The Need for Biases in Learning Generalizations
Tom M. Mitchell, 1980
Details   

Early Warning Indicators of Business Failure
Subhash Sharma and Vijay Mahajan, 1980
Details   

The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs
Edward J. Sondik, 1978
Details   

Determining Sample Size for Pretesting Comparative Effectiveness of Advertising Copies
Siddhartha R. Dalal and V. Srinivasan, 1977
Details   

Sequential models for clinical trials
Herman Chernoff, 1967
Details   

Optimal Control of Markov Processes with Incomplete State Information
K. J. Åström, 1965
Details   

A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations
Edward Paulson, 1964
Details   

Probability Inequalities for Sums of Bounded Random Variables
Wassily Hoeffding, 1963
Details   

The Future of Data Analysis
John W. Tukey, 1962
Details   

Comparing entries in random sample tests
W. A. Becker, 1961
Details   

A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs
Robert E. Bechhofer, 1958
Details   

Dynamic Programming
Richard Bellman, 1957
Details   

Some aspects of the sequential design of experiments
Herbert Robbins, 1952
Details   

Sequential Analysis
Abraham Wald, 1947
Details   

Contributions to the Theory of Sequential Analysis. I
M. A. Girshick, 1946
Details   

Contributions to the Theory of Sequential Analysis, II, III
M. A. Girshick, 1946
Details