PAC Subset Selection in Stochastioc Multi-armed Bandits

PAC Subset Selection in Stochastioc Multi-armed Bandits (2012)

Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone

We consider the problem of selecting, from among the arms of a stochastic n-armed bandit, a subset of size m of those arms with the highest expected rewards, based on efficiently sampling the arms. This ``subset selection'' problem finds application in a variety of areas. In the authors' previous work (Kalyanakrishnan+Stone:2010-bandit), this problem is framed under a PAC setting (denoted ``Explore-m''), and corresponding sampling algorithms are analyzed. Whereas the formal analysis therein is restricted to the worst case sample complexity of algorithms, in this paper, we design and analyze an algorithm (``LUCB'') with improved expected sample complexity. Interestingly LUCB bears a close resemblance to the well-known UCB algorithm for regret minimization. The expected sample complexity bound we show for LUCB is novel even for single-arm selection (Explore-1). We also give a lower bound on the worst case sample complexity of PAC algorithms for Explore-m.

View:

PDF, PS, HTML

Citation:

In In proceedings of the 29th International Conference on Machine Learning (ICML 2012), June-July 2012.

Bibtex:

People

Shivaram Kalyanakrishnan	Ph.D. Alumni	shivaram [at] cs utexas edu
Peter Stone	Faculty	pstone [at] cs utexas edu

Areas of Interest

Machine Learning

Labs

Learning Agents