A high performance prototype system for chinese text categorization

Authors:
Xinghua Fan
Affiliations:
College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, P.R. China
Venue:
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Year:
2006

Citing 6
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Classifying chinese texts in two steps

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

How to improve the accuracy of categorization is a big challenge in text categorization. This paper proposes a high performance prototype system for Chinese text categorization, which mainly includes feature extraction subsystem, feature selection subsystem, and reliability evaluation subsystem for classification results. The proposed prototype system employs a two-step classifying strategy. First, the features that are effective for all testing texts are used to classify texts. Then, the reliability evaluation subsystem evaluates the classification results directly according to the outputs of the classifier, and divides them into two parts: texts classified reliable or not. Only for the texts classified unreliable at the first step, go to the second step. Second, a classifier uses the features that are more subtle and powerful for those texts classified unreliable to classify the texts. The proposed prototype system is successfully implemented in a case that exploits a Naive Bayesian classifier as the classifier in the first and second steps. Experiments show that the proposed prototype system achieves a high performance.