An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Hi-index | 0.00 |
In state-of-the-art Text Classification (TC) approaches, only features explicitly mentioned in training set are taken into consideration, but after several decades' endeavor, it seems that these approaches have all reached a plateau. In this paper,we propose an automatic taxonomy mapping algorithm to map from original flat taxonomy to a hierarchical, human-edit on-line taxonomy (ODP), from which we could then synthesize new training samples with common-sense world knowledge by performing a constrained web focus crawling. We show that by leveraging the domain-knowledge which otherwise can't be deduced from training set directly, the text classifier will have better generalization ability. Preliminary Experimental Results on several Chinese data sets confirm the effectiveness of this approach.