Multiclass classification with bandit feedback using adaptive regularization

Authors:
Koby Crammer;Claudio Gentile
Affiliations:
Department of Electrical Engineering, The Technion, Haifa, Israel 32000;DICOM, Universita' dell'Insubria, Varese, Italy 21100
Venue:
Machine Learning
Year:
2013

Citing 13
Cited 0

Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
On the Learnability and Design of Output Codes for Multiclass Problems

Machine Learning
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A Second-Order Perceptron Algorithm

SIAM Journal on Computing
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Efficient bandit algorithms for online multiclass prediction

Proceedings of the 25th international conference on Machine learning
Robust bounds for classification via selective sampling

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient Euclidean projections in linear time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Multi-class confidence weighted algorithms

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning to trade off between exploration and exploitation in multiclass bandit prediction

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $\mathcal{O}(\sqrt{T}\log T)$ , which improves over the current best bounds of $\mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.