Chinese Documents Classification Based on N-Grams

Authors:
Shuigeng Zhou;Jihong Guan
Affiliations:
-;-
Venue:
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2002

Citing 2
Cited 6

Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text segmentation for text retrieval: achievements and problems

Journal of the American Society for Information Science

Evaluation and Construction of Training Corpuses for Text Classification: A Preliminary Study

NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
An Approach to Improve Text Classification Efficiency

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Pruning Training Corpus to Speedup Text Classification

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Chinese text categorization based on the binary weighting model with non-binary smoothing

ECIR'03 Proceedings of the 25th European conference on IR research
A study on feature weighting in Chinese text categorization

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Free-gram phrase identification for modeling Chinese text

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional Chinese documents classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the techniques of utilizing N-gram information to categorize Chinese documents so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A Chinese documents classification system following above described techniques is implemented with Naive Bayes, kNN and hierarchical classification methods. Experimental results show that our system can achieve satisfactory performance, which is comparable with other traditional classifiers.