Unbiased online active learning in data streams

Authors:
Wei Chu;Martin Zinkevich;Lihong Li;Achint Thomas;Belle Tseng
Affiliations:
Microsoft, Redmond, USA;Yahoo! Labs, Sunnyvale, USA;Yahoo! Labs, Sunnyvale, USA;Yahoo! Labs, Sunnyvale, USA;Yahoo! Labs, Sunnyvale, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 13
Cited 2

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Information-based objective functions for active data selection

Neural Computation
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A family of algorithms for approximate bayesian inference

A family of algorithms for approximate bayesian inference
Agnostic active learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Worst-Case Analysis of Selective Sampling for Linear Classification

The Journal of Machine Learning Research
Importance weighted active learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Measuring classifier performance: a coherent alternative to the area under the ROC curve

Machine Learning
Active learning with statistical models

Journal of Artificial Intelligence Research
Optimistic active learning using mutual information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Online modeling of proactive moderation system for auction fraud detection

Proceedings of the 21st international conference on World Wide Web
Feedback-driven multiclass active learning for data streams

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unlabeled samples can be intelligently selected for labeling to minimize classification error. In many real-world applications, a large number of unlabeled samples arrive in a streaming manner, making it impossible to maintain all the data in a candidate pool. In this work, we focus on binary classification problems and study selective labeling in data streams where a decision is required on each sample sequentially. We consider the unbiasedness property in the sampling process, and design optimal instrumental distributions to minimize the variance in the stochastic process. Meanwhile, Bayesian linear classifiers with weighted maximum likelihood are optimized online to estimate parameters. In empirical evaluation, we collect a data stream of user-generated comments on a commercial news portal in 30 consecutive days, and carry out offline evaluation to compare various sampling strategies, including unbiased active learning, biased variants, and random sampling. Experimental results verify the usefulness of online active learning, especially in the non-stationary situation with concept drift.