Text Classification without Negative Examples Revisit

Authors:
Gabriel Pui Cheong Fung;Jeffrey X. Yu;Hongjun Lu;Philip S. Yu
Affiliations:
-;IEEE Computer Society;-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 22
Cited 30

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Heterogeneous Learner for Web Page Classification

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Text Classification without Labeled Negative Documents

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Text classification by labeling words

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Time-dependent event hierarchy construction

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting the meaning of medical concept correlations

Proceedings of the 4th international conference on Knowledge capture
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Classify Documents with Only a Small Positive Training Set

ECML '07 Proceedings of the 18th European conference on Machine Learning
Two-Stage Model for Information Filtering

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Machine Learning
Query by shots: retrieving meaningful events using multiple queries and rough set theory

Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Active Concept Learning For Ontology Evolution

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Concept-Based, Personalized Web Information Gathering: A Survey

KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Symbolic representation of text documents

Proceedings of the Third Annual ACM Bangalore Conference
Disambiguating identity web references using Web 2.0 data and semantics

Web Semantics: Science, Services and Agents on the World Wide Web
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Query by few video examples using rough set theory and partially supervised learning

SAMT'10 Proceedings of the 5th international conference on Semantic and digital media technologies
What Makes a Phone a Business Phone - Querying Concepts in Product Data

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Conceptual views for entity-centric search: turning data into meaningful concepts

Computer Science - Research and Development
Supervised learning with minimal effort

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Event retrieval in video archives using rough set theory and partially supervised learning

Multimedia Tools and Applications
Query-Based video event definition using rough set theory and high-dimensional representation

MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Ensemble based positive unlabeled learning for time series classification

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Artificial immune system for illicit content identification in social media

Journal of the American Society for Information Science and Technology
Similarity-based approach for positive and unlabelled learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Information Sciences: an International Journal
User-based collaborative filtering on cross domain by tag transfer learning

Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining
Building high-performance classifiers using positive and unlabeled examples for text classification

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Multi-view learning from imperfect tagging

Proceedings of the 20th ACM international conference on Multimedia
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems
Querying concepts in product data by means of query expansion

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, building a classifier requires two sets of examples: positive examples and negative examples. This paper studies the problem of building a text classifier using positive examples (P) and unlabeled examples (U). The unlabeled examples are mixed with both positive and negative examples. Since no negative example is given explicitly, the task of building a reliable text classifier becomes far more challenging. Simply treating all of the unlabeled examples as negative examples and building a classifier thereafter is undoubtedly a poor approach to tackling this problem. Generally speaking, most of the studies solved this problem by a two-step heuristic: First, extract negative examples (N) from U. Second, build a classifier based on P and N. Surprisingly, most studies did not try to extract positive examples from U. Intuitively, enlarging P by P' (positive examples extracted from U) and building a classifier thereafter should enhance the effectiveness of the classifier. Throughout our study, we find that extracting P' is very difficult. A document in U that possesses the features exhibited in P does not necessarily mean that it is a positive example, and vice versa. The very large size of and very high diversity in U also contribute to the difficulties of extracting P'. In this paper, we propose a labeling heuristic called PNLH to tackle this problem. PNLH aims at extracting high quality positive examples and negative examples from U and can be used on top of any existing classifiers. Extensive experiments based on several benchmarks are conducted. The results indicated that PNLH is highly feasible, especially in the situation where |P| is extremely small.