Automatic chinese text classification using n-gram model

Authors:
Show-Jane Yen;Yue-Shi Lee;Yu-Chieh Wu;Jia-Ching Ying;Vincent S. Tseng
Affiliations:
Dept. of Computer Science and Information Engineering, Ming Chuan University, Taoyuan County, Taiwan;Dept. of Computer Science and Information Engineering, Ming Chuan University, Taoyuan County, Taiwan;Dept. of Computer Science and Information Engineering, Ming Chuan University, Taoyuan County, Taiwan;Dept. of Computer Science and Information Engineering, National Cheng Kung University, Tainan City, Taiwan;Dept. of Computer Science and Information Engineering, National Cheng Kung University, Tainan City, Taiwan
Venue:
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Year:
2010

Citing 13
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
On Machine Learning Methods for Chinese Document Categorization

Applied Intelligence
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Text classification in Asian languages without word segmentation

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Combining naive bayes and n-gram language models for text classification

ECIR'03 Proceedings of the 25th European conference on IR research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic Chinese text classification is an important and well-known research topic in the field of information retrieval and natural language processing. However, past researches often ignore the problem of word segmentation and the relationship between words. This paper proposes an N-gram-based language model for Chinese text classification which considers the relationship between words. To prevent from the out-of-vocabulary problem, a novel smoothing method based on logistic regression is also proposed to improve the performance. The experimental result shows that our approach outperforms the previous N-gram-based classification model above 11% on micro-average F-measure.