Automatic chinese text classification using n-gram model

  • Authors:
  • Show-Jane Yen;Yue-Shi Lee;Yu-Chieh Wu;Jia-Ching Ying;Vincent S. Tseng

  • Affiliations:
  • Dept. of Computer Science and Information Engineering, Ming Chuan University, Taoyuan County, Taiwan;Dept. of Computer Science and Information Engineering, Ming Chuan University, Taoyuan County, Taiwan;Dept. of Computer Science and Information Engineering, Ming Chuan University, Taoyuan County, Taiwan;Dept. of Computer Science and Information Engineering, National Cheng Kung University, Tainan City, Taiwan;Dept. of Computer Science and Information Engineering, National Cheng Kung University, Tainan City, Taiwan

  • Venue:
  • ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic Chinese text classification is an important and well-known research topic in the field of information retrieval and natural language processing. However, past researches often ignore the problem of word segmentation and the relationship between words. This paper proposes an N-gram-based language model for Chinese text classification which considers the relationship between words. To prevent from the out-of-vocabulary problem, a novel smoothing method based on logistic regression is also proposed to improve the performance. The experimental result shows that our approach outperforms the previous N-gram-based classification model above 11% on micro-average F-measure.