Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese text segmentation for text retrieval: achievements and problems
Journal of the American Society for Information Science
Evaluation and Construction of Training Corpuses for Text Classification: A Preliminary Study
NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
An Approach to Improve Text Classification Efficiency
ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Pruning Training Corpus to Speedup Text Classification
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Chinese text categorization based on the binary weighting model with non-binary smoothing
ECIR'03 Proceedings of the 25th European conference on IR research
A study on feature weighting in Chinese text categorization
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Free-gram phrase identification for modeling Chinese text
Information Processing Letters
Hi-index | 0.00 |
Traditional Chinese documents classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the techniques of utilizing N-gram information to categorize Chinese documents so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A Chinese documents classification system following above described techniques is implemented with Naive Bayes, kNN and hierarchical classification methods. Experimental results show that our system can achieve satisfactory performance, which is comparable with other traditional classifiers.