Querying discriminative and representative samples for batch mode active learning

Authors:
Zheng Wang;Jieping Ye
Affiliations:
Arizona State University, Tempe, AZ, USA;Arizona State University, Tempe, AZ, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 25
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Convergence of alternating optimization

Neural, Parallel & Scientific Computations
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Batch mode active learning and its application to medical image classification

ICML '06 Proceedings of the 23rd international conference on Machine learning
Active learning via transductive experimental design

ICML '06 Proceedings of the 23rd international conference on Machine learning
Outlier detection by active learning

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating structured biological data by Kernel Maximum Mean Discrepancy

Bioinformatics
Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error

The Journal of Machine Learning Research
Factorization meets the neighborhood: a multifaceted collaborative filtering model

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Importance weighted active learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active learning with statistical models

Journal of Artificial Intelligence Research
Representative sampling for text classification using support vector machines

ECIR'03 Proceedings of the 25th European conference on IR research
Hilbert Space Embeddings and Metrics on Probability Measures

The Journal of Machine Learning Research
Two faces of active learning

Theoretical Computer Science
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Active learning with adaptive regularization

Pattern Recognition
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
A kernel two-sample test

The Journal of Machine Learning Research
Batch mode active sampling based on marginal probability distribution matching

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Empirical risk minimization (ERM) provides a useful guideline for many machine learning and data mining algorithms. Under the ERM principle, one minimizes an upper bound of the true risk, which is approximated by the summation of empirical risk and the complexity of the candidate classifier class. To guarantee a satisfactory learning performance, ERM requires that the training data are i.i.d. sampled from the unknown source distribution. However, this may not be the case in active learning, where one selects the most informative samples to label and these data may not follow the source distribution. In this paper, we generalize the empirical risk minimization principle to the active learning setting. We derive a novel form of upper bound for the true risk in the active learning setting; by minimizing this upper bound we develop a practical batch mode active learning method. The proposed formulation involves a non-convex integer programming optimization problem. We solve it efficiently by an alternating optimization method. Our method is shown to query the most informative samples while preserving the source distribution as much as possible, thus identifying the most uncertain and representative queries. Experiments on benchmark data sets and real-world applications demonstrate the superior performance of our proposed method in comparison with the state-of-the-art methods.