UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
PAC Subset Selection in Stochastioc Multi-armed Bandits (2012)
Shivaram Kalyanakrishnan
, Ambuj Tewari,
Peter Auer, and Peter Stone
We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This ``subset selection'' problem finds application in a variety of areas. In the authors' previous work (Kalyanakrishnan+Stone:2010-bandit), this problem is framed under a PAC setting (denoted ``Explore-m''), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm (``LUCB'') with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimization. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample complexity of PAC algorithms for Explore-m.
View:
PDF
,
PS
,
HTML
Citation:
In
In proceedings of the 29th International Conference on Machine Learning (ICML 2012)
, June-July 2012.
Bibtex:
@inproceedings{ICML12-shivaram, title={PAC Subset Selection in Stochastioc Multi-armed Bandits}, author={Shivaram Kalyanakrishnan and Ambuj Tewari and Peter Auer and Peter Stone}, booktitle={In proceedings of the 29th International Conference on Machine Learning (ICML 2012)}, month={June-July}, url="http://www.cs.utexas.edu/users/ai-lab?ICML12-shivaram", year={2012} }
People
Shivaram Kalyanakrishnan
Ph.D. Alumni
shivaram [at] cs utexas edu
Peter Stone
Faculty
pstone [at] cs utexas edu
Areas of Interest
Machine Learning
Labs
Learning Agents