Learning to identify unexpected instances in the test set

Authors:
Xiao-Li Li;Bing Liu;See-Kiong Ng
Affiliations:
Institute for Infocomm Research, Singapore;Department of Computer Science, University of Illinois at Chicago, Chicago, IL;Institute for Infocomm Research, Singapore
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 7
Cited 9

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
One-class svms for document classification

The Journal of Machine Learning Research
A needle in a haystack: local one-class optimization

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Text Classification without Labeled Negative Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Directly Identify Unexpected Instances in the Test Set by Entropy Maximization

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Distributional similarity vs. PU learning for entity set expansion

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Semi-supervised learning from only positive and unlabeled data using entropy

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Estimate unlabeled-data-distribution for semi-supervised PU learning

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Ensemble based positive unlabeled learning for time series classification

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Positive unlabeled learning for time series classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
What users care about: a framework for social content alignment

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Instance selection and instance weighting for cross-domain sentiment classification via PU learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional classification involves building a classifier using labeled training examples from a set of predefined classes and then applying the classifier to classify test instances into the same set of classes. In practice, this paradigm can be problematic because the test data may contain instances that do not belong to any of the previously defined classes. Detecting such unexpected instances in the test set is an important issue in practice. The problem can be formulated as learning from positive and unlabeled examples (PU learning). However, current PU learning algorithms require a large proportion of negative instances in the unlabeled set to be effective. This paper proposes a novel technique to solve this problem in the text classification domain. The technique first generates a single artificial negative document AN. The sets P and {AN} are then used to build a naïve Bayesian classifier. Our experiment results show that this method is significantly better than existing techniques.