Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories

Authors:
Rayid Ghani
Affiliations:
-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 14

Stylistic and lexical co-training for web block classification

Proceedings of the 6th annual ACM international workshop on Web information and data management
Semi-supervised Learning of Tree-Structured RBF Networks Using Co-training

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Separability of ternary codes for sparse designs of error-correcting output codes

Pattern Recognition Letters
Intravascular Ultrasound Tissue Characterization with Sub-class Error-Correcting Output Codes

Journal of Signal Processing Systems
A discriminative model for semi-supervised learning

Journal of the ACM (JACM)
2010 Special Issue: Semi-supervised learning for tree-structured ensembles of RBF networks with Co-Training

Neural Networks
Two stage reject rule for ECOC classification systems

MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Decoding design based on posterior probabilities in Ternary Error-Correcting Output Codes

Pattern Recognition
Feature-Correlation based multi-view detection

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
DCPE co-training for classification

Neurocomputing
An application of the self-organizing map to multiple view unsupervised learning

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Design of reject rules for ECOC classification systems

Pattern Recognition
Adaptive error-correcting output codes

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC)setup by decomposing multiclass problems into multiple binary problems and then use Co-Training to learn the individual binary classification problems. We show that our method isespecially useful for classification tasks involving a large number of categories where Co-training doesn't perform very well by itself and when combined with ECOC, outperforms several other algorithms that combine labeled and unlabeled data for text classification in terms of accuracy, precision-recall tradeoff, and efficiency.