Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Efficient Selection of Multiple Bandit Arms: Theory and Practice

Efficient Selection of Multiple Bandit Arms: Theory and Practice.
Shivaram Kalyanakrishnan and Peter Stone.
In Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML), 2010.

Download

[PDF]257.1kB  [postscript]679.3kB  

Abstract

We consider the general, widely applicable problem of selecting from $n$ real-valued random variables a subset of size $m$ of those with the highest means, based on as few samples as possible. This problem, which we denote \textscExplore-$m$, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the one best of $n$ random variables (\textscExplore-$1$). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against state-of-the-art subset selection strategies demonstrate significant gains in sample efficiency.

BibTeX Entry

@InProceedings{ICML10-kalyanakrishnan,
	author = "Shivaram Kalyanakrishnan and Peter Stone",
	title = "Efficient Selection of Multiple Bandit Arms: Theory and Practice",
	booktitle = "Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML)",
	year = "2010",
	abstract = {We consider the general, widely applicable problem of selecting from $n$ real-valued random variables a subset of size $m$ of those with the highest means, based on as few samples as possible. This problem, which we denote \textsc{Explore}-$m$, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multi-armed bandits that is devoted to identifying just the one best of $n$ random variables (\textsc{Explore}-$1$). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against state-of-the-art subset selection strategies demonstrate significant gains in sample efficiency.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Jun 10, 2026 15:26:46