Extracting initial and reliable negative documents to enhance classification performance

Authors:
Hui Wang;Wanli Zuo
Affiliations:
College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Changchun, China;College of Computer Science and Technology, Jilin University, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Changchun, China
Venue:
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Year:
2006

Citing 17
Cited 0

Support-Vector Networks

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic Capacity Tuning of Very Large VC-Dimension Classifiers

Advances in Neural Information Processing Systems 5, [NIPS Conference]
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
One-class svms for document classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Semi-supervised learning with explicit misclassification modeling

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.02

Visualization

Abstract

Most existing text classification work assumes that training data are completely labeled. In real life, some information retrieval problems can only be described as learning a binary classifier from a set of incompletely labeled examples, where a small set of labeled positive examples and a very large set of unlabeled ones are provided. In this case, all of the traditional text classification methods can't work properly. In this paper, we propose a method called Weighted Voting Classifier, which is an improved 1-DNF algorithm. Experimental results on the Reuters-21578 set show that our algorithm Weighting Voting Classifier outperforms PEBL and one-class SVM in terms of F measure. Weighting Voting Classifier can achieve high F score when comparing with PEBL and one-class SVM. Furthermore, the reduction of iterations is 2.26 when comparing the method of PEBL with ours.