Improving text categorization using the importance of words in different categories

Authors:
Zhihong Deng;Ming Zhang
Affiliations:
National Laboratory on Machine Perception, Peking University, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Venue:
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Year:
2005

Citing 5
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Automatic text categorization is the task of assigning natural language text documents to predefined categories based on their context. In order to classify text documents, we must evaluate the values of words in documents. In previous research, the value of a word is commonly represented by the product of the term frequency and the inverted document frequency of the word, which is called TF*IDF for short. Since there is a different role for a word in different category documents, we should measure the value of the word according to various categories. In this paper, we proposal a new method used to measure the importance of words in categories and a new framework for text categorization. To verity the efficiency of our new method, we conduct experiments using three text collections. The k-NN is used as the classifier in our experiments. Experimental results show that our new method makes a significant improvement in all these text collections.