Chinese Documents Classification Based on N-Grams

  • Authors:
  • Shuigeng Zhou;Jihong Guan

  • Affiliations:
  • -;-

  • Venue:
  • CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional Chinese documents classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the techniques of utilizing N-gram information to categorize Chinese documents so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A Chinese documents classification system following above described techniques is implemented with Naive Bayes, kNN and hierarchical classification methods. Experimental results show that our system can achieve satisfactory performance, which is comparable with other traditional classifiers.