Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
The Journal of Machine Learning Research
Prediction, Learning, and Games
Prediction, Learning, and Games
The Journal of Machine Learning Research
Efficient bandit algorithms for online multiclass prediction
Proceedings of the 25th international conference on Machine learning
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Exploitation and exploration in a performance based contextual advertising system
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
Hi-index | 0.00 |
We study multi-class bandit prediction, an online learning problem where the learner only receives a partial feedback in each trial indicating whether the predicted class label is correct. The exploration vs. exploitation tradeoff strategy is a well-known technique for online learning with incomplete feedback (i.e., bandit setup). Banditron [8], a multi-class online learning algorithm for bandit setting, maximizes the run-time gain by balancing between exploration and exploitation with a fixed tradeoff parameter. The performance of Banditron can be quite sensitive to the choice of the tradeoff parameter and therefore effective algorithms to automatically tune this parameter is desirable. In this paper, we propose three learning strategies to automatically adjust the tradeoff parameter for Banditron. Our extensive empirical study with multiple real-world data sets verifies the efficacy of the proposed approach in learning the exploration vs. exploitation tradeoff parameter.