The effect of adding relevance information in a relevance feedback environment
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Measurement-theoretical investigation of the MZ-metric
SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning and evaluating classifiers under sample selection bias
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Active sample selection for named entity transliteration
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Multi-domain sentiment classification
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Supervised domain adaption for WSD
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Online methods for multi-domain learning and adaptation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Locating complex named entities in web text
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning to identify unexpected instances in the test set
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Graph ranking for sentiment transfer
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Heterogeneous transfer learning for image clustering via the social web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Domain adaptive bootstrapping for named entity recognition
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Discriminative Learning Under Covariate Shift
The Journal of Machine Learning Research
IEEE Transactions on Knowledge and Data Engineering
Distributional similarity vs. PU learning for entity set expansion
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Learning from positive and unlabeled examples with different data distributions
ECML'05 Proceedings of the 16th European conference on Machine Learning
Entity set expansion using topic information
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
On positive and unlabeled learning for text classification
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Building high-performance classifiers using positive and unlabeled examples for text classification
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
A parallel genetic programming for single class classification
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
What users care about: a framework for social content alignment
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.02 |
This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Traditional binary classification involves building a classifier using labeled positive and negative training examples. The classifier is then applied to classify test instances into positive and negative classes. A fundamental assumption is that the training and test data are identically distributed. However, this assumption may not hold in practice. In this paper, we study a particular problem where the positive data is identically distributed but the negative data may or may not be so. Many practical text classification and retrieval applications fit this model. We argue that in this setting negative training data should not be used, and that PU learning can be employed to solve the problem. Empirical evaluation has been conducted to support our claim. This result is important as it may fundamentally change the current binary classification paradigm.