Automatic text categorization by unsupervised learning

Authors:
Youngjoong Ko;Jungyun Seo
Affiliations:
Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul, Korea;Sogang University, 1 Sinsu-dong, Mapo-gu, Seoul, Korea
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 9
Cited 16

Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Similarity-based word sense disambiguation

Computational Linguistics - Special issue on word sense disambiguation
Document classification using a finite mixture model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Improving text categorization using the importance of sentences

Information Processing and Management: an International Journal
Using the feature projection technique based on a normalized voting method for text classification

Information Processing and Management: an International Journal
Text categorization using feature projections

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Automatic text categorization using the importance of sentences

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
MEMPHIS: a mobile agent-based system for enabling acquisition of multilingual content and providing flexible format internet premium services

Journal of Systems Architecture: the EUROMICRO Journal
Higher order feature selection for text classification

Knowledge and Information Systems
Learning with unlabeled data for text categorization using bootstrapping and feature projection techniques

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Investigating unsupervised learning for text categorization bootstrapping

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Text classification from unlabeled documents with bootstrapping and feature projection techniques

Information Processing and Management: an International Journal
Effects of Term Distributions on Binary Classification

IEICE - Transactions on Information and Systems
Improving text categorization bootstrapping via unsupervised learning

ACM Transactions on Speech and Language Processing (TSLP)
Fully Automatic Text Categorization by Exploiting WordNet

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Improving text classification with concept index terms and expansion terms

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Towards the taxonomy-oriented categorization of yellow pages queries

ACM Transactions on Internet Technology (TOIT)
Text categorization using SVMs with rocchio ensemble for internet information classification

ICCNMC'05 Proceedings of the Third international conference on Networking and Mobile Computing
Automatic word clustering for text categorization using global information

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of text categorization is to classify documents into a certain number of predefined categories. The previous works in this area have used a large number of labeled training documents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating training documents. In this paper, we propose an unsupervised learning method to overcome these difficulties. The proposed method divides the documents into sentences, and categorizes each sentence using keyword lists of each category and sentence similarity measure. And then, it uses the categorized sentences for training. The proposed method shows a similar degree of performance, compared with the traditional supervised learning methods. Therefore, this method can be used in areas where low-cost text categorization is needed. It also can be used for creating training documents.