An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm

Authors:
Zhishan Guo;Li Lu;Shijia Xi;Fuchun Sun
Affiliations:
State Key Laboratory on Intelligent Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084;State Key Laboratory on Intelligent Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084;State Key Laboratory on Intelligent Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084;State Key Laboratory on Intelligent Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing, China 100084
Venue:
ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Year:
2009

Citing 10
Cited 1

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Information Retrieval Systems: Theory and Implementation

Information Retrieval Systems: Theory and Implementation
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A novel refinement approach for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management
Accurate Chinese Text Classification via Multiple Strategies

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 03
Chinese Text Classification Based on Summarization Technique

SKG '07 Proceedings of the Third International Conference on Semantics, Knowledge and Grid
Leveraging World Knowledge in Chinese Text Classification

ALPIT '07 Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)
An Incremental Chinese Text Classification Algorithm Based on Quick Clustering

ISIP '08 Proceedings of the 2008 International Symposiums on Information Processing

Class-driven correlation learning for chinese document categorization using discriminative features

Proceedings of the Third International Conference on Internet Multimedia Computing and Service

Quantified Score

Hi-index	0.00

Visualization

Abstract

Different kinds of methods have been proposed in Chinese document classification, while high dimension of feature vector is one of the most significant limits in these methods. In this paper, an important difference is pointed out between Chinese document classification and English document classification. Then an efficient approach is proposed to reduce the dimension of feature vector in Chinese document classification using Genetic Algorithm. Through merely choosing the set of much more "important" features, the proposed method significantly reduces the number of Chinese feature words. Experiments combining with several relative studies show that the proposed method has great effect on dimension reduction with little loss in correctly classified rate.