Upper confidence weighted learning for efficient exploration in multiclass prediction with binary feedback

Authors:
Hung Ngo;Matthew Luciw;Ngo Anh Vien;Jurgen Schmidhuber
Affiliations:
IDSIA, Galleria 2, Switzerland;IDSIA, Galleria 2, Switzerland;MLR Lab, University of Stuttgart, Stuttgart, Germany;IDSIA, Galleria 2, Switzerland
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 9
Cited 1

Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
A Second-Order Perceptron Algorithm

SIAM Journal on Computing
Prediction, Learning, and Games

Prediction, Learning, and Games
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Efficient bandit algorithms for online multiclass prediction

Proceedings of the 25th international conference on Machine learning
Multi-class confidence weighted algorithms

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning
Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010)

IEEE Transactions on Autonomous Mental Development

Human-robot cooperation: fast, interactive learning from binary feedback

Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a novel algorithm called Upper Confidence Weighted Learning (UCWL) for online multiclass learning from binary feedback. UCWL combines the Upper Confidence Bound (UCB) framework with the Soft Confidence Weighted (SCW) online learning scheme. UCWL achieves state of the art performance (especially on noisy and nonseparable data) with low computational costs. Estimated confidence intervals are used for informed exploration, which enables faster learning than the uninformed exploration case or the case where exploration is not used. The targeted application setting is human-robot interaction (HRI), in which a robot is learning to classify its observations while a human teaches it by providing only binary feedback (e.g., right/wrong). Results in an HRI experiment, and with two benchmark datasets, show UCWL outperforms other algorithms in the online binary feedback setting, and surprisingly even sometimes beats state-of-the-art algorithms that get full feedback, while UCWL gets only binary feedback on the same data.