Classification algorithms
On the exponential value of labeled samples
Pattern Recognition Letters
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
BOAT—optimistic decision tree construction
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Continuous queries over data streams
ACM SIGMOD Record
Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
IEEE Transactions on Information Theory - Part 2
Incorporating large unlabeled data to enhance EM classification
Journal of Intelligent Information Systems
Hi-index | 0.00 |
This paper investigates the problem of augmenting labeled data with unlabeled data to improve classification accuracy. This is significant for many applications such as image classification where obtaining classification labels is expensive, while large unlabeled examples are easily available. We investigate an Expectation Maximization (EM) algorithm for learning from labeled and unlabeled data. The reason why unlabeled data boosts learning accuracy is because it provides the information about the joint probability distribution. A theoretical argument shows that the more unlabeled examples are combined in learning, the more accurate the result. We then introduce B-EM algorithm, based on the combination of EM with bootstrap method, to exploit the large unlabeled data while avoiding prohibitive I/O cost. Experimental results over both synthetic and real data sets that the proposed approach has a satisfactory performance.