Let us know your decision: Pool-based active training of a generative classifier with the selection strategy 4DS

Authors:
Tobias Reitmaier;Bernhard Sick
Affiliations:
Intelligent Embedded Systems Lab, University of Kassel, Kassel, Germany;Intelligent Embedded Systems Lab, University of Kassel, Kassel, Germany
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 46
Cited 0

Training connectionist networks with queries and selective sampling

Advances in neural information processing systems 2
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Bivariate Decision Trees

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
A Bayesian Regularization Method for the Probabilistic RBF Network

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
An Improved Learning Algorithm for Augmented Naive Bayes

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Selective Sampling with Redundant Views

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A Multi-SVM Classification System

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Probabilistic RBF Network for Classification

IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 4 - Volume 4
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Genetic Programming for data classification: partitioning the search space

Proceedings of the 2004 ACM symposium on Applied computing
A Probabilistic Active Support Vector Learning Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Democratic Co-Learning

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Utilizing Information Theoretic Diversity for SVM Active Learn

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Active Learning to Maximize Area Under the ROC Curve

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
FBP: A Frontier-Based Tree-Pruning Algorithm

INFORMS Journal on Computing
On multi-view active learning and the combination with semi-supervised learning

Proceedings of the 25th international conference on Machine learning
Regularized nonsmooth Newton method for multi-class support vector machines

Optimization Methods & Software - Systems Analysis, Optimization and Data Mining in Biomedicine
Semi-supervised and active learning with the probabilistic RBF classifier

Neurocomputing
Dynamic Distance-Based Active Learning with SVM

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Dual Strategy Active Learning

ECML '07 Proceedings of the 18th European conference on Machine Learning
Hinge Rank Loss and the Area Under the ROC Curve

ECML '07 Proceedings of the 18th European conference on Machine Learning
Semisupervised SVM batch mode active learning with applications to image retrieval

ACM Transactions on Information Systems (TOIS)
Active learning for object classification: from exploration to exploitation

Data Mining and Knowledge Discovery
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning with multiple views

Journal of Artificial Intelligence Research
Optimistic active learning using mutual information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A visual approach to sketched symbol recognition

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Using Data Mining for Wine Quality Assessment

DS '09 Proceedings of the 12th International Conference on Discovery Science
Training of radial basis function classifiers with resilient propagation and variational Bayesian inference

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A Streaming Parallel Decision Tree Algorithm

The Journal of Machine Learning Research
Incorporating diversity and density in active learning for relevance feedback

ECIR'07 Proceedings of the 29th European conference on IR research
Multiple-view multiple-learner active learning

Pattern Recognition
So near and yet so far: New insight into properties of some well-known classifier paradigms

Information Sciences: an International Journal
Speeding up logistic model tree induction

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A stigmergy based approach to data mining

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Active learning with the probabilistic RBF classifier

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.07

Visualization

Abstract

In this article, we introduce and investigate 4DS, a new selection strategy for pool-based active training of a generative classifier, namely CMM (classifier based on a probabilistic mixture model). Such a generative classifier aims at modeling the processes underlying the ''generation'' of the data. 4DS considers the distance of samples (observations) to the decision boundary, the density in regions, where samples are selected, the diversity of samples in the query set that are chosen for labeling, and, indirectly, the unknown class distribution of the samples by utilizing the responsibilities of the model components for these samples. The combination of the four measures in 4DS is self-optimizing in the sense that the weights of the distance, density, and class distribution measures depend on the currently estimated performance of the classifier. With 17 benchmark data sets it is shown that 4DS outperforms a random selection strategy (baseline method), a pure closest sampling approach, ITDS (information theoretic diversity sampling), DWUS (density-weighted uncertainty sampling), DUAL (dual strategy for active learning), PBAC (prototype based active learning), and 3DS (a technique we proposed earlier that does not consider responsibility information) regarding various evaluation criteria such as ranked performance based on classification accuracy, number of labeled samples (data utilization), and learning speed assessed by the area under the learning curve. It is also shown that-due to the use of responsibility information-4DS solves a key problem of active learning: The class distribution of the samples chosen for labeling actually approximates the unknown ''true'' class distribution of the overall data set quite well. With this article, we also pave the way for advanced selection strategies for an active training of discriminative classifiers such as support vector machines or decision trees: We show that responsibility information derived from generative models can successfully be employed to improve the training of those classifiers.