Classifying chinese texts in two steps

  • Authors:
  • Xinghua Fan;Maosong Sun;Key-sun Choi;Qin Zhang

  • Affiliations:
  • State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, China;State Key Laboratory of Intelligent Technology and Systems, Tsinghua University, Beijing, China;Computer Science Division, Korterm, KAIST, Daejeon, Korea;State Intellectual Property Office of P.R. China, Beijing, China

  • Venue:
  • IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a two-step method for Chinese text categorization (TC). In the first step, a Naïve Bayesian classifier is used to fix the fuzzy area between two categories, and, in the second step, the classifier with more subtle and powerful features is used to deal with documents in the fuzzy area, which are thought of being unreliable in the first step. The preliminary experiment validated the soundness of this method. Then, the method is extended from two-class TC to multi-class TC. In this two-step framework, we try to further improve the classifier by taking the dependences among features into consideration in the second step, resulting in a Causality Naïve Bayesian Classifier.