| Function Approximation |   |   | Partial Observability |   |   | Learning Methods |   |   | Ensembles |   |   | 
| Stochastic Optimisation |   |   | General RL |   |   | General ML |   |   | Multiagent Learning |   |   | 
| Comparison/Integration |   |   | Bandits |   |   | Applications |   |   | Robot Soccer |   |   | 
| Humanoids |   |   | Parameter |   |   | MDP |   |   | Empirical |   |   | 
| Failure Warning |   |   | Representation |   |   | General AI |   |   | Neural Networks |   |   | 
| All |   |   | 
 Almost Optimal Exploration in Multi-Armed Bandits
 Zohar Karnin,  Tomer Koren, and  Oren Somekh, 2013
    Details   
 Information Complexity in Bandit Subset Selection
 Emilie Kaufmann and  Shivaram Kalyanakrishnan, 2013
    Details   
 Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
 Victor Gabillon,  Mohammad Ghavamzadeh, and  Alessandro Lazaric, 2012
    Details   
 Planning in Reward-Rich Domains via PAC Bandits
 Sergiu Goschin,  Ari Weinstein,  Michael L. Littman, and  Erick Chastain, 2012
    Details   
 Best Arm Identification in Multi-Armed Bandits
 Jean-Yves Audibert,  Sébastien Bubeck, and  Rémi Munos, 2010
    Details   
 UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
 Peter Auer and  Ronald Ortner, 2010
    Details   
 Simulation optimization using the cross-entropy method with optimal computing budget allocation
 Donghai He,  Loo Hay Lee,  Chun-Hung Chen,  Michael C. Fu, and  Segev Wasserkrug, 2010
    Details   
 An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
 Junya Honda and  Akimichi Takemura, 2010
    Details   
 Non-Stochastic Bandit Slate Problems
 Satyen Kale,  Lev Reyzin, and  Robert E. Schapire, 2010
    Details   
 Efficient Selection of Multiple Bandit Arms: Theory and Practice
 Shivaram Kalyanakrishnan and  Peter Stone, 2010
    Details   
 Regret bounds for sleeping experts and bandits
 Robert Kleinberg,  Alexandru Niculescu-Mizil, and  Yogeshwer Sharma, 2010
    Details   
 A contextual-bandit approach to personalized news article recommendation
 Lihong Li,  Wei Chu,  John Langford, and  Robert E. Schapire, 2010
    Details   
 $epsilon$-First Policies for Budget-Limited Multi-Armed Bandits
 Long Tran-Thanh,  Archie Chapman,  Enrique Munoz de Cote,  Alex Rogers, and  Nicholas R. Jennings, 2010
    Details   
 Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
 Jean-Yves Audibert,  Rémi Munos, and  Csaba Szepesvári, 2009
    Details   
 Pure Exploration in Multi-armed Bandits Problems
 Sébastien Bubeck,  Rémi Munos, and  Gilles Stoltz, 2009
    Details   
 Combinatorial Bandits
 Nicolò Cesa-Bianchi and  Gábor Lugosi, 2009
    Details   
 Efficient Simulation Budget Allocation for Selecting an Optimal Subset
 Chun-Hung Chen,  Donghai He,  Michael Fu, and  Loo Hay Lee, 2008
    Details   
 Multi-armed bandits in metric spaces
 Robert Kleinberg,  Aleksandrs Slivkins, and  Eli Upfal, 2008
    Details   
 Empirical Bernstein stopping
 Volodymyr Mnih,  Csaba Szepesvári, and  Jean-Yves Audibert, 2008
    Details   
 Tuning Bandit Algorithms in Stochastic Environments
 Jean-Yves Audibert,  Rémi Munos, and  Csaba Szepesvári, 2007
    Details   
 Approximation Algorithms for Budgeted Learning Problems
 Sudipto Guha and  Kamesh Munagala, 2007
    Details   
 Recent advances in ranking and selection
 Seong-Hee Kim and  Barry L. Nelson, 2007
    Details   
 Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
 Eyal Even-Dar,  Shie Mannor, and  Yishay Mansour, 2006
    Details   
 Bandit Based Monte-Carlo Planning
 Levente Kocsis and  Csaba Szepesvári, 2006
    Details   
 Active Model Selection
 Omid Madani,  Daniel J. Lizotte, and  Russell Greiner, 2004
    Details   
 The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
 Shie Mannor and  John N. Tsitsiklis, 2004
    Details   
 Using Ranking and Selection to Clean Up after Simulation Optimization
 Justin Boesel,  Barry L. Nelson, and  Seong-Hee Kim, 2003
    Details   
 Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem
 Shie Mannor and  John N. Tsitsiklis, 2003
    Details   
 Finite-time Analysis of the Multiarmed Bandit Problem
 Peter Auer,  Nicolò Cesa-Bianchi, and  Paul Fischer, 2002
    Details   
 The Nonstochastic Multiarmed Bandit Problem
 Peter Auer,  Nicolò Cesa-Bianchi,  Yoav Freund, and  Robert E. Schapire, 2002
    Details   
 PAC Bounds for Multi-armed Bandit and Markov Decision Processes
 Eyal Even-Dar,  Shie Mannor, and  Yishay Mansour, 2002
    Details   
 Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations
 Shanti S. Gupta and  S. Panchapakesan, 2002
    Details   
 Mining complex models from arbitrarily large databases in constant time
 Geoff Hulten and  Pedro Domingos, 2002
    Details   
 A fully sequential procedure for indifference-zone selection in simulation
 Seong-Hee Kim and  Barry L. Nelson, 2001
    Details   
 Mining high-speed data streams
 Pedro Domingos and  Geoff Hulten, 2000
    Details   
 Selecting and Ordering Populations: A New Statistical Methodology
 Jean Dickinson Gibbons,  Ingram Olkin, and  Milton Sobel, 1999
    Details   
 An empirical evaluation of several methods to select the best system
 Koichiro Inoue,  Stephen E. Chick, and  Chun-Hung Chen, 1999
    Details   
 Design and analysis of experiments for statistical selection, screening, and multiple comparisons
 Robert E. Bechhofer,  Thomas J. Santner, and  David M. Goldsman, 1995
    Details   
 Sequential PAC Learning
 Dale Schuurmans and  Russell Greiner, 1995
    Details   
 Restricted Subset Selection Procedures for Simulation
 David W. Sullivan and  James R. Wilson, 1989
    Details   
 Bandit problems
 Donald A. Berry and  Bert Fristedt, 1985
    Details   
 A procedure for selecting a subset of size $m$ containing the $l$ best of $k$ independent normal populations, with applications to simulation
 Lloyd W. Koenig and  Averill M. Law, 1985
    Details   
 Asymptotically Efficient Adaptive Allocation Rules
 T. L. Lai and  Herbert Robbins, 1985
    Details   
 Determining Sample Size for Pretesting Comparative Effectiveness of Advertising Copies
 Siddhartha R. Dalal and  V. Srinivasan, 1977
    Details   
 Sequential models for clinical trials
 Herman Chernoff, 1967
    Details   
 A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations
 Edward Paulson, 1964
    Details   
 Probability Inequalities for Sums of Bounded Random Variables
 Wassily Hoeffding, 1963
    Details   
 Comparing entries in random sample tests
 W. A. Becker, 1961
    Details   
 A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs
 Robert E. Bechhofer, 1958
    Details   
 Some aspects of the sequential design of experiments
 Herbert Robbins, 1952
    Details   
 Sequential Analysis
 Abraham Wald, 1947
    Details   
 Contributions to the Theory of Sequential Analysis. I
 M. A. Girshick, 1946
    Details   
 Contributions to the Theory of Sequential Analysis, II, III
 M. A. Girshick, 1946
    Details