Co-training with a Single Natural Feature Set Applied to Email Classification

Authors:
Jason Chan;Irena Koprinska;Josiah Poon
Affiliations:
The University of Sydney, Australia;The University of Sydney, Australia;The University of Sydney, Australia
Venue:
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2004

Citing 8
Cited 5

Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Email classification with co-training

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Syskill & webert: Identifying interesting web sites

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Diverse reduct subspaces based co-training for partially labeled data

International Journal of Approximate Reasoning
Lateen EM: unsupervised training with multiple objectives, applied to dependency grammar induction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
DCPE co-training for classification

Neurocomputing
High performance query expansion using adaptive co-training

Information Processing and Management: an International Journal
Batch-Mode Active Learning with Semi-supervised Cluster Tree for Text Classification

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

When dealing with information overload from the Internet, such as the classification of Web pages and the filtering of email spam, a new technique called co-training has been shown to be a promising approach to help build more accurate classifiers. Co-training allows classifiers to learn with fewer labelled documents by taking advantage of the more abundant unclassified documents. However, conventional co-training requires the dataset to be described by two disjoint and natural feature sets that are sufficiently redundant. In many practical situations, it is not intuitively obvious how to obtain two natural feature sets. This paper shows that when only a single natural feature set is used, the performance of co-training is beneficial in the application of email classification.