Learning to classify texts using positive and unlabeled data

Authors:
Xiaoli Li;Bing Liu
Affiliations:
School of Computing, National University of Singapore, Singapore-MIT Alliance, Singapore;Department of Computer Science, University of Illinois at Chicago, Chicago, IL
Venue:
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Year:
2003

Citing 15
Cited 76

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of adding relevance information in a relevance feedback environment

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Efficient noise-tolerant learning from statistical queries

Journal of the ACM (JACM)
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Active + Semi-supervised Learning = Robust Multi-View Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Filtering for personal web information agents

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Dealing with different distributions in learning from

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Text Classification without Labeled Negative Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Parameter free bursty events detection in text streams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Mining Ontology for Automatically Acquiring Web User Information Needs

IEEE Transactions on Knowledge and Data Engineering
Efficient learning of Naive Bayes classifiers under class-conditional classification noise

ICML '06 Proceedings of the 23rd international conference on Machine learning
A partially supervised classification approach to dominant and recessive human disease gene prediction

Computer Methods and Programs in Biomedicine
Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification

Information Processing and Management: an International Journal
Learning Bayesian classifiers from positive and unlabeled examples

Pattern Recognition Letters
Effective spam filtering: A single-class learning and ensemble approach

Decision Support Systems
SVM based adaptive learning method for text classification from positive and unlabeled documents

Knowledge and Information Systems
PE-PUC: A Graph Based PU-Learning Approach for Text Classification

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning to Classify Documents with Only a Small Positive Training Set

ECML '07 Proceedings of the 18th European conference on Machine Learning
Document-Base Extraction for Single-Label Text Classification

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A two-stage text mining model for information filtering

Proceedings of the 17th ACM conference on Information and knowledge management
Classification techniques with minimal labelling effort and application to medical reports

International Journal of Data Mining and Bioinformatics
Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Machine Learning
Semi-supervised document retrieval

Information Processing and Management: an International Journal
Cool Blog Classification from Positive and Unlabeled Examples

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Semi-Supervised Text Classification Using Positive and Unlabeled Data

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
From words to senses: a case study of subjectivity recognition

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Learning to identify unexpected instances in the test set

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Mining Negative Relevance Feedback for Information Filtering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Active learning in partially supervised classification

Proceedings of the 18th ACM conference on Information and knowledge management
Content based image retrieval using unclean positive examples

IEEE Transactions on Image Processing
Mining rough association from text documents for web information gathering

Transactions on rough sets VII
Ontology based web mining for information gathering

WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
A novel reliable negative method based on clustering for learning from positive and unlabeled examples

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Mining positive and negative patterns for relevance feature discovery

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A knowledge-based model using ontologies for personalized web information gathering

Web Intelligence and Agent Systems
Distributional similarity vs. PU learning for entity set expansion

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hybrid DIAAF/RS: statistical textual feature selection for language-independent text classification

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Semi-supervised learning from only positive and unlabeled data using entropy

WAIM'10 Proceedings of the 11th international conference on Web-age information management
A survey of recent trends in one class classification

AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
Applying machine learning in accounting research

Expert Systems with Applications: An International Journal
A pattern mining approach for information filtering systems

Information Retrieval
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Labeling negative examples in supervised learning of new gene regulatory connections

CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Text classification for data loss prevention

PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Bayesian classifiers for positive unlabeled learning

WAIM'11 Proceedings of the 12th international conference on Web-age information management
On positive and unlabeled learning for text classification

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
A pairwise ranking based approach to learning with positive and unlabeled examples

Proceedings of the 20th ACM international conference on Information and knowledge management
Automatic Moderation of Online Discussion Sites

International Journal of Electronic Commerce
Extracting initial and reliable negative documents to enhance classification performance

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Comparison of documents classification techniques to classify medical reports

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Spying out accurate user preferences for search engine adaptation

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Learning to filter junk e-mail from positive and unlabeled examples

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A two-stage decision model for information filtering

Decision Support Systems
Mining rough association from text documents

RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
A new approach for semi-supervised online news classification

HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
Learning from positive and unlabeled examples with different data distributions

ECML'05 Proceedings of the 16th European conference on Machine Learning
Partially supervised classification – based on weighted unlabeled samples support vector machine

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
A new PU learning algorithm for text classification

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Protein-Protein interactions classification from text via local learning with class priors

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
A cost-sensitive technique for positive-example learning supporting content-based product recommendations in B-to-C e-commerce

Decision Support Systems
Ensemble based positive unlabeled learning for time series classification

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Positive unlabeled learning for time series classification

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Similarity-based approach for positive and unlabelled learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Identifying Linux bug fixing patches

Proceedings of the 34th International Conference on Software Engineering
Building high-performance classifiers using positive and unlabeled examples for text classification

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Compensating for annotation errors in training a relation extractor

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Named entity disambiguation in streaming data

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Multi-view learning from imperfect tagging

Proceedings of the 20th ACM international conference on Multimedia
Sampling the Web as Training Data for Text Classification

International Journal of Digital Library Systems
DTW-D: time series semi-supervised learning from a single example

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
What users care about: a framework for social content alignment

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Search by multiple examples

Proceedings of the 7th ACM international conference on Web search and data mining
A bagging SVM to learn from positive and unlabeled examples

Pattern Recognition Letters
Rare category exploration

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In traditional text classification, a classifier is built using labeled training documents of every class. This paper studies a different problem. Given a set P of documents of a particular class (called positive class) and a set U of unlabeled documents that contains documents from class P and also other types of documents (called negative class documents), we want to build a classifier to classify the documents in U into documents from P and documents not from P. The key feature of this problem is that there is no labeled negative document, which makes traditional text classification techniques inapplicable. In this paper, we propose an effective technique to solve the problem. It combines the Rocchio method and the SVM technique for classifier building. Experimental results show that the new method outperforms existing methods significantly.