Incorporating large unlabeled data to enhance EM classification

Authors:
Xintao Wu
Affiliations:
CS Department, University of North Carolina at Charlotte, Charlotte, USA 28223
Venue:
Journal of Intelligent Information Systems
Year:
2006

Citing 18
Cited 0

Classification algorithms

Classification algorithms
Unknown attribute values in induction

Proceedings of the sixth international workshop on Machine learning
On the exponential value of labeled samples

Pattern Recognition Letters
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Text classification in a hierarchical mixture model for small training sets

Proceedings of the tenth international conference on Information and knowledge management
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Continuous queries over data streams

ACM SIGMOD Record
Induction of Decision Trees

Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
B-EM: a classifier incorporating bootstrap with EM approach for data mining

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter

IEEE Transactions on Information Theory - Part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of augmenting labeled data with unlabeled data to improve classification accuracy. This is significant for many applications such as image classification where obtaining classification labels is expensive, while large unlabeled examples are easily available. We investigate an Expectation Maximization (EM) algorithm for learning from labeled and unlabeled data. The reason why unlabeled data boosts learning accuracy is because it provides the information about the joint probability distribution. A theoretical argument shows that the more unlabeled examples are combined in learning, the more accurate the result. We then introduce B-EM algorithm, based on the combination of EM with bootstrap method, to exploit the large unlabeled data while avoiding prohibitive I/O cost. Experimental results over both synthetic and real data sets show that the proposed approach has a satisfactory performance.