Automatic text categorization using the importance of sentences

Authors:
Youngjoong Ko;Jinwoo Park;Jungyun Seo
Affiliations:
Sogang University, Mapo-gu, Seoul, Korea;Sogang University, Mapo-gu, Seoul, Korea;Sogang University, Mapo-gu, Seoul, Korea
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 9
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Extended Boolean information retrieval

Communications of the ACM
A vector space model for automatic indexing

Communications of the ACM
A Study of Approaches to Hypertext Categorization

Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic text categorization by unsupervised learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Hybrid hill-climbing and knowledge-based methods for intelligent news filtering

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Web-page classification through summarization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A novel efficient classification algorithm for search engines

AIC'08 Proceedings of the 8th conference on Applied informatics and communications
Finding related sentence pairs in MEDLINE

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic text categorization is a problem of automatically assigning text documents to predefined categories. In order to classify text documents, we must extract good features from them. In previous research, a text document is commonly represented by the term frequency and the inverted document frequency of each feature. Since there is a difference between important sentences and unimportant sentences in a document, the features from more important sentences should be considered more than other features. In this paper, we measure the importance of sentences using text summarization techniques. Then a document is represented as a vector of features with different weights according to the importance of each sentence. To verify our new method, we conducted experiments on two language newsgroup data sets: one written by English and the other written by Korean. Four kinds of classifiers were used in our experiments: Naïve Bayes, Rocchio, k-NN, and SVM. We observed that our new method made a significant improvement in all classifiers and both data sets.