Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques

Authors:
Youngjoong Ko;Jungyun Seo
Affiliations:
Sogang Univ., Mapo-gu, Seoul, Korea;Sogang Univ., Mapo-gu Seoul, Korea
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 15
Cited 13

An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Semi-supervised support vector machines

Proceedings of the 1998 conference on Advances in neural information processing systems II
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Using unlabeled data to improve text classification

Using unlabeled data to improve text classification
Similarity-based word sense disambiguation

Computational Linguistics - Special issue on word sense disambiguation
Using the feature projection technique based on a normalized voting method for text classification

Information Processing and Management: an International Journal
Automatic text categorization by unsupervised learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Text categorization using feature projections

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Investigating unsupervised learning for text categorization bootstrapping

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification

Information Processing and Management: an International Journal
Using classification techniques for informal requirements in the requirements analysis-supporting system

Information and Software Technology
An effective sentence-extraction technique using contextual information and statistical approaches for text summarization

Pattern Recognition Letters
Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Improving text categorization bootstrapping via unsupervised learning

ACM Transactions on Speech and Language Processing (TSLP)
Text categorization from category name via lexical reference

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
A hybrid ontology directed feedback selection algorithm for supporting creative problem solving dialogues

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Domain-specific sentiment analysis using contextual feature generation

Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Directional distributional similarity for lexical inference

Natural Language Engineering
Large-scale hierarchical text classification without labelled data

Proceedings of the fourth ACM international conference on Web search and data mining
A high performance centroid-based classification approach for language identification

Pattern Recognition Letters
Classifying unlabeled short texts using a fuzzy declarative approach

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. We here propose a new automatic text categorization method for learning from only unlabeled data using a bootstrapping framework and a feature projection technique. From results of our experiments, our method showed reasonably comparable performance compared with a supervised method. If our method is used in a text categorization task, building text categorization systems will become significantly faster and less expensive.