Ultraconservative online algorithms for multiclass problems
The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
A Second-Order Perceptron Algorithm
SIAM Journal on Computing
Confidence-weighted linear classification
Proceedings of the 25th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Efficient bandit algorithms for online multiclass prediction
Proceedings of the 25th international conference on Machine learning
Robust bounds for classification via selective sampling
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient Euclidean projections in linear time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Multi-class confidence weighted algorithms
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning to trade off between exploration and exploitation in multiclass bandit prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit indicating whether the predicted label is correct or not, rather than the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade-off exploration and exploitation, instead of random sampling as performed by most current algorithms. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model which is also chosen adversarially. We show a regret of $\mathcal{O}(\sqrt{T}\log T)$ , which improves over the current best bounds of $\mathcal{O}(T^{2/3})$ in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems and on four vowel recognition tasks, often obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.