Text classification by labeling words

Authors:
Bing Liu;Xiaoli Li;Wee Sun Lee;Philip S. Yu
Affiliations:
Department of Computer Science, University of Illinois at Chicago;Department of Computer Science, National University of Singapore;Department of Computer Science, National University of Singapore;IBM T. J. Watson Research Center
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 16
Cited 50

Algorithms for clustering data

Algorithms for clustering data
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Text clustering with extended user feedback

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Constructing informative prior distributions from domain knowledge in text classification

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Reducing the human overhead in text categorization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Investigating unsupervised learning for text categorization bootstrapping

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Efficient bayesian hierarchical user modeling for recommendation system

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Integrating rich user feedback into intelligent user interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Learning from labeled features using generalized expectation criteria

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
Keyword-Labeled Classification with Auxiliary Unlabeled Documents

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Knowledge Supervised Text Classification with No Labeled Documents

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Directly Identify Unexpected Instances in the Test Set by Entropy Maximization

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Interacting meaningfully with machine learning systems: Three experiments

International Journal of Human-Computer Studies
Sentiment analysis of blogs by combining lexical knowledge with text classification

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Active dual supervision: reducing the cost of annotating examples and features

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Towards modeling threaded discussions using induced ontology knowledge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Text categorization from category name via lexical reference

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Interactive clustering of text collections according to a user-specified criterion

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Extracting lexical reference rules from Wikipedia

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Fully Automatic Text Categorization by Exploiting WordNet

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Topic-wise, sentiment-wise, or otherwise?: Identifying the hidden dimension for unsupervised text classification

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Mining chat conversations for sex identification

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Domain adaptation for conditional random fields

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Towards subjectifying text clustering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Chat mining: Automatically determination of chat conversations' topic in Turkish text based chat mediums

Expert Systems with Applications: An International Journal
Voice of the customers: mining online customer reviews for product feature-based ranking

WOSN'10 Proceedings of the 3rd conference on Online social networks
A unified approach to active dual supervision for labeling features and examples

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Directional distributional similarity for lexical inference

Natural Language Engineering
Generating an event arrangement for understanding news articles on the web

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part II
Which clustering do you want? inducing your ideal clustering with minimal feedback

Journal of Artificial Intelligence Research
Interactive feature selection for document clustering

Proceedings of the 2011 ACM Symposium on Applied Computing
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Filtering semi-structured documents based on faceted feedback

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
On positive and unlabeled learning for text classification

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

Journal of Biomedical Informatics
Collecting novel technical terms from the web by estimating domain specificity of a term

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Towards automatic domain classification of technical terms: estimating domain specificity of a term using the web

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
End-user interactions with intelligent and autonomous systems

CHI '12 Extended Abstracts on Human Factors in Computing Systems
Semi-supervised document clustering with dual supervision through seeding

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Enhancing semi-supervised document clustering with feature supervision

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A unified framework for document clustering with dual supervision

ACM SIGAPP Applied Computing Review
Personalized document clustering with dual supervision

Proceedings of the 2012 ACM symposium on Document engineering
Sentiment analysis by augmenting expectation maximisation with lexical knowledge

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
On Knowledge-Enhanced Document Clustering

International Journal of Information Retrieval Research
End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression

Artificial Intelligence
TOM: Twitter opinion mining framework using hybrid classification scheme

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, text classifiers are built from labeled training examples. Labeling is usually done manually by human experts (or the users), which is a labor intensive and time consuming process. In the past few years, researchers investigated various forms of semi-supervised learning to reduce the burden of manual labeling. In this paper, we propose a different approach. Instead of labeling a set of documents, the proposed method labels a set of representative words for each class. It then uses these words to extract a set of documents for each class from a set of unlabeled documents to form the initial training set. The EM algorithm is then applied to build the classifier. The key issue of the approach is how to obtain a set of representative words for each class. One way is to ask the user to provide them, which is difficult because the user usually can only give a few words (which are insufficient for accurate learning). We propose a method to solve the problem. It combines clustering and feature selection. The technique can effectively rank the words in the unlabeled set according to their importance. The user then selects/labels some words from the ranked list for each class. This process requires less effort than providing words with no help or manual labelillg of documents. Our results show that the new method is highly effective and promising.