Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Semi-supervised support vector machines
Proceedings of the 1998 conference on Advances in neural information processing systems II
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Heterogeneous Learner for Web Page Classification
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples
IEEE Transactions on Knowledge and Data Engineering
Text Classification without Labeled Negative Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Text classification by labeling words
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Time-dependent event hierarchy construction
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting the meaning of medical concept correlations
Proceedings of the 4th international conference on Knowledge capture
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Classify Documents with Only a Small Positive Training Set
ECML '07 Proceedings of the 18th European conference on Machine Learning
Two-Stage Model for Information Filtering
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Query by shots: retrieving meaningful events using multiple queries and rough set theory
Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
Building a Text Classifier by a Keyword and Unlabeled Documents
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Active Concept Learning For Ontology Evolution
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
OcVFDT: one-class very fast decision tree for one-class classification of data streams
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Building a Text Classifier by a Keyword and Wikipedia Knowledge
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Concept-Based, Personalized Web Information Gathering: A Survey
KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Symbolic representation of text documents
Proceedings of the Third Annual ACM Bangalore Conference
Disambiguating identity web references using Web 2.0 data and semantics
Web Semantics: Science, Services and Agents on the World Wide Web
Editorial: Classifying text streams by keywords using classifier ensemble
Data & Knowledge Engineering
Query by few video examples using rough set theory and partially supervised learning
SAMT'10 Proceedings of the 5th international conference on Semantic and digital media technologies
What Makes a Phone a Business Phone - Querying Concepts in Product Data
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Conceptual views for entity-centric search: turning data into meaningful concepts
Computer Science - Research and Development
Supervised learning with minimal effort
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Event retrieval in video archives using rough set theory and partially supervised learning
Multimedia Tools and Applications
Query-Based video event definition using rough set theory and high-dimensional representation
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Ensemble based positive unlabeled learning for time series classification
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Artificial immune system for illicit content identification in social media
Journal of the American Society for Information Science and Technology
Similarity-based approach for positive and unlabelled learning
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples
Information Sciences: an International Journal
User-based collaborative filtering on cross domain by tag transfer learning
Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining
Building high-performance classifiers using positive and unlabeled examples for text classification
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Multi-view learning from imperfect tagging
Proceedings of the 20th ACM international conference on Multimedia
Learning from data streams with only positive and unlabeled data
Journal of Intelligent Information Systems
Querying concepts in product data by means of query expansion
Web Intelligence and Agent Systems
Hi-index | 0.00 |
Traditionally, building a classifier requires two sets of examples: positive examples and negative examples. This paper studies the problem of building a text classifier using positive examples (P) and unlabeled examples (U). The unlabeled examples are mixed with both positive and negative examples. Since no negative example is given explicitly, the task of building a reliable text classifier becomes far more challenging. Simply treating all of the unlabeled examples as negative examples and building a classifier thereafter is undoubtedly a poor approach to tackling this problem. Generally speaking, most of the studies solved this problem by a two-step heuristic: First, extract negative examples (N) from U. Second, build a classifier based on P and N. Surprisingly, most studies did not try to extract positive examples from U. Intuitively, enlarging P by P' (positive examples extracted from U) and building a classifier thereafter should enhance the effectiveness of the classifier. Throughout our study, we find that extracting P' is very difficult. A document in U that possesses the features exhibited in P does not necessarily mean that it is a positive example, and vice versa. The very large size of and very high diversity in U also contribute to the difficulties of extracting P'. In this paper, we propose a labeling heuristic called PNLH to tackle this problem. PNLH aims at extracting high quality positive examples and negative examples from U and can be used on top of any existing classifiers. Extensive experiments based on several benchmarks are conducted. The results indicated that PNLH is highly feasible, especially in the situation where |P| is extremely small.