Combining committee-based semi-supervised learning and active learning

Authors:
Mohamed Farouk Abdel Hady;Friedhelm Schwenker
Affiliations:
Institute of Neural Information Processing, University of Ulm, Ulm, Germany;Institute of Neural Information Processing, University of Ulm, Ulm, Germany
Venue:
Journal of Computer Science and Technology
Year:
2010

Citing 24
Cited 3

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Bagging predictors

Machine Learning
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nearest Neighbors in Random Subspaces

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Selective Sampling with Redundant Views

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Tree Induction for Probability-Based Ranking

Machine Learning
Email classification with co-training

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Unsupervised Improvement of Visual Detectors using Co-Training

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Democratic Co-Learning

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Improve Decision Trees for Probability-Based Ranking by Lazy Learners

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Analyzing Co-training Style Algorithms

ECML '07 Proceedings of the 18th European conference on Machine Learning
When Semi-supervised Learning Meets Ensemble Learning

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Semi-supervised learning by disagreement

Knowledge and Information Systems
Semi-supervised multiple classifier systems: background and research directions

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Combining active learning and semi-supervised for improving learning performance

Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies
A semi-supervised feature ranking method with ensemble learning

Pattern Recognition Letters
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions are strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.