Exploiting unlabeled data in ensemble methods

Authors:
Kristin P. Bennett;Ayhan Demiriz;Richard Maclin
Affiliations:
Rensselaer Polytechnic Institute, Troy, NY;Verizon Inc., Irving, TX;University of Minnesota-Duluth, Duluth, MN
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 7
Cited 35

The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Boosting in the limit: maximizing the margin of learned ensembles

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning

An introduction to boosting and leveraging

Advanced lectures on machine learning
Semisupervised learning from different information sources

Knowledge and Information Systems
Data cleaning using belief propagation

Proceedings of the 2nd international workshop on Information quality in information systems
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Efficient semantic annotation method for indexing large personal video database

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
The value of agreement a new boosting algorithm

Journal of Computer and System Sciences
Semi-Supervised Boosting for Multi-Class Classification

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
VideoCut: Removing Irrelevant Frames by Discovering the Object of Interest

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Nearest neighbor editing aided by unlabeled data

Information Sciences: an International Journal
Information theoretic regularization for semi-supervised boosting

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hybrid Hierarchical Classifiers for Hyperspectral Data Analysis

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Supervised Selective Combining Pattern Recognition Modalities and Its Application to Signature Verification by Fusing On-Line and Off-Line Kernels

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Computational Statistics & Data Analysis
Learning from labeled and unlabeled data: an empirical study across techniques and domains

Journal of Artificial Intelligence Research
Semi-supervised Robust Alternating AdaBoost

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Semi-supervised speech act recognition in emails and forums

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Boosting with pairwise constraints

Neurocomputing
Automatic image annotation using a semi-supervised ensemble of classifiers

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Semi-supervised dependency parsing using generalized tri-training

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Semi-supervised ranking for document retrieval

Computer Speech and Language
Semi-supervised multi-class Adaboost by exploiting unlabeled data

Expert Systems with Applications: An International Journal
A new co-training-style random forest for computer aided diagnosis

Journal of Intelligent Information Systems
Metric anomaly detection via asymmetric risk minimization

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Unsupervised Weight Parameter Estimation Method for Ensemble Learning

Journal of Mathematical Modelling and Algorithms
Using co-training and self-training in semi-supervised multiple classifier systems

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Automated retraining methods for document classification and their parameter tuning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Semi-supervised multiple classifier systems: background and research directions

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
The value of agreement, a new boosting algorithm

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Improving mining quality by exploiting data dependency

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Transfer learning with local smoothness regularizer

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Exploiting unlabeled data to enhance ensemble diversity

Data Mining and Knowledge Discovery
An analysis of a spatial EA parallel boosting algorithm

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Disagreement-Based multi-system tracking

ACCV'12 Proceedings of the 11th international conference on Computer Vision - Volume 2
Collaborative boosting for activity classification in microblogs

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting for multiclass semi-supervised learning

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudolabeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled of data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.