COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Information-based objective functions for active data selection
Neural Computation
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A family of algorithms for approximate bayesian inference
A family of algorithms for approximate bayesian inference
ICML '06 Proceedings of the 23rd international conference on Machine learning
Worst-Case Analysis of Selective Sampling for Linear Classification
The Journal of Machine Learning Research
Importance weighted active learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active learning with statistical models
Journal of Artificial Intelligence Research
Optimistic active learning using mutual information
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Online modeling of proactive moderation system for auction fraud detection
Proceedings of the 21st international conference on World Wide Web
Feedback-driven multiclass active learning for data streams
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Unlabeled samples can be intelligently selected for labeling to minimize classification error. In many real-world applications, a large number of unlabeled samples arrive in a streaming manner, making it impossible to maintain all the data in a candidate pool. In this work, we focus on binary classification problems and study selective labeling in data streams where a decision is required on each sample sequentially. We consider the unbiasedness property in the sampling process, and design optimal instrumental distributions to minimize the variance in the stochastic process. Meanwhile, Bayesian linear classifiers with weighted maximum likelihood are optimized online to estimate parameters. In empirical evaluation, we collect a data stream of user-generated comments on a commercial news portal in 30 consecutive days, and carry out offline evaluation to compare various sampling strategies, including unbiased active learning, biased variants, and random sampling. Experimental results verify the usefulness of online active learning, especially in the non-stationary situation with concept drift.