Shivaram's Reading List


Function Approximation     Partial Observability     Learning Methods     Ensembles    
Stochastic Optimisation     General RL     General ML     Multiagent Learning    
Comparison/Integration     Bandits     Applications     Robot Soccer    
Humanoids     Parameter     MDP     Empirical    
Failure Warning     Representation     General AI     Neural Networks    
All    

Bandits

Almost Optimal Exploration in Multi-Armed Bandits
Zohar Karnin, Tomer Koren, and Oren Somekh, 2013
Details   

Information Complexity in Bandit Subset Selection
Emilie Kaufmann and Shivaram Kalyanakrishnan, 2013
Details   

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Victor Gabillon, Mohammad Ghavamzadeh, and Alessandro Lazaric, 2012
Details   

Planning in Reward-Rich Domains via PAC Bandits
Sergiu Goschin, Ari Weinstein, Michael L. Littman, and Erick Chastain, 2012
Details   

Best Arm Identification in Multi-Armed Bandits
Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos, 2010
Details   

UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
Peter Auer and Ronald Ortner, 2010
Details   

Simulation optimization using the cross-entropy method with optimal computing budget allocation
Donghai He, Loo Hay Lee, Chun-Hung Chen, Michael C. Fu, and Segev Wasserkrug, 2010
Details   

An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
Junya Honda and Akimichi Takemura, 2010
Details   

Non-Stochastic Bandit Slate Problems
Satyen Kale, Lev Reyzin, and Robert E. Schapire, 2010
Details   

Efficient Selection of Multiple Bandit Arms: Theory and Practice
Shivaram Kalyanakrishnan and Peter Stone, 2010
Details   

Regret bounds for sleeping experts and bandits
Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma, 2010
Details   

A contextual-bandit approach to personalized news article recommendation
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire, 2010
Details   

$epsilon$-First Policies for Budget-Limited Multi-Armed Bandits
Long Tran-Thanh, Archie Chapman, Enrique Munoz de Cote, Alex Rogers, and Nicholas R. Jennings, 2010
Details   

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári, 2009
Details   

Pure Exploration in Multi-armed Bandits Problems
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz, 2009
Details   

Combinatorial Bandits
Nicolò Cesa-Bianchi and Gábor Lugosi, 2009
Details   

Efficient Simulation Budget Allocation for Selecting an Optimal Subset
Chun-Hung Chen, Donghai He, Michael Fu, and Loo Hay Lee, 2008
Details   

Multi-armed bandits in metric spaces
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal, 2008
Details   

Empirical Bernstein stopping
Volodymyr Mnih, Csaba Szepesvári, and Jean-Yves Audibert, 2008
Details   

Tuning Bandit Algorithms in Stochastic Environments
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári, 2007
Details   

Approximation Algorithms for Budgeted Learning Problems
Sudipto Guha and Kamesh Munagala, 2007
Details   

Recent advances in ranking and selection
Seong-Hee Kim and Barry L. Nelson, 2007
Details   

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
Eyal Even-Dar, Shie Mannor, and Yishay Mansour, 2006
Details   

Bandit Based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvári, 2006
Details   

Active Model Selection
Omid Madani, Daniel J. Lizotte, and Russell Greiner, 2004
Details   

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
Shie Mannor and John N. Tsitsiklis, 2004
Details   

Using Ranking and Selection to “Clean Up“ after Simulation Optimization
Justin Boesel, Barry L. Nelson, and Seong-Hee Kim, 2003
Details   

Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem
Shie Mannor and John N. Tsitsiklis, 2003
Details   

Finite-time Analysis of the Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer, 2002
Details   

The Nonstochastic Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire, 2002
Details   

PAC Bounds for Multi-armed Bandit and Markov Decision Processes
Eyal Even-Dar, Shie Mannor, and Yishay Mansour, 2002
Details   

Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations
Shanti S. Gupta and S. Panchapakesan, 2002
Details   

Mining complex models from arbitrarily large databases in constant time
Geoff Hulten and Pedro Domingos, 2002
Details   

A fully sequential procedure for indifference-zone selection in simulation
Seong-Hee Kim and Barry L. Nelson, 2001
Details   

Mining high-speed data streams
Pedro Domingos and Geoff Hulten, 2000
Details   

Selecting and Ordering Populations: A New Statistical Methodology
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, 1999
Details   

An empirical evaluation of several methods to select the best system
Koichiro Inoue, Stephen E. Chick, and Chun-Hung Chen, 1999
Details   

Design and analysis of experiments for statistical selection, screening, and multiple comparisons
Robert E. Bechhofer, Thomas J. Santner, and David M. Goldsman, 1995
Details   

Sequential PAC Learning
Dale Schuurmans and Russell Greiner, 1995
Details   

Restricted Subset Selection Procedures for Simulation
David W. Sullivan and James R. Wilson, 1989
Details   

Bandit problems
Donald A. Berry and Bert Fristedt, 1985
Details   

A procedure for selecting a subset of size $m$ containing the $l$ best of $k$ independent normal populations, with applications to simulation
Lloyd W. Koenig and Averill M. Law, 1985
Details   

Asymptotically Efficient Adaptive Allocation Rules
T. L. Lai and Herbert Robbins, 1985
Details   

Determining Sample Size for Pretesting Comparative Effectiveness of Advertising Copies
Siddhartha R. Dalal and V. Srinivasan, 1977
Details   

Sequential models for clinical trials
Herman Chernoff, 1967
Details   

A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations
Edward Paulson, 1964
Details   

Probability Inequalities for Sums of Bounded Random Variables
Wassily Hoeffding, 1963
Details   

Comparing entries in random sample tests
W. A. Becker, 1961
Details   

A Sequential Multiple-Decision Procedure for Selecting the Best One of Several Normal Populations with a Common Unknown Variance, and Its Use with Various Experimental Designs
Robert E. Bechhofer, 1958
Details   

Some aspects of the sequential design of experiments
Herbert Robbins, 1952
Details   

Sequential Analysis
Abraham Wald, 1947
Details   

Contributions to the Theory of Sequential Analysis. I
M. A. Girshick, 1946
Details   

Contributions to the Theory of Sequential Analysis, II, III
M. A. Girshick, 1946
Details