Accurate Chinese Text Classification via Multiple Strategies

Authors:
Xiulan Hao;Chenghong Zhang;Xiaopeng Tao;Shuyun Wang;and Yunfa Hu
Affiliations:
Fudan University;Fudan University;Fudan University;Fudan University;Fudan University
Venue:
FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 03
Year:
2007

Citing 0
Cited 1

An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text classification is one of means to understand text content. It is widely used in information retrieving, fil- tering spam, monitoring ill gossips, and blocking porno- graphic and evil messages. kNN is widely used in text categorization, but it suffers from biased training data set. In developing Prototype of Internet Information Security for Shanghai Council of Information and Security, we de- tect that when training data set is biased, almost all test documents of some rare (smaller) categories are classi- fied into common (larger) ones by traditional kNN clas- sifier. The performance of text classification can not sat- isfy the user's requirement in this case. To alleviate such a misfortune, we adopt 2 measures to boost kNN classi- fier. Firstly, we optimize features by removing some can- didate features. Secondly, we modify traditional decision rules by integrating number of training samples of each category with them. Exhaustive experiments illustrate that the adapted kNN achieves significant classification perfor- mance improvement on biased corpora.