GA based optimal keyword extraction in an automatic chinese web document classification system

Authors:
Chih-Hsun Chou;Chin-Chuan Han;Ya-Hui Chen
Affiliations:
Department of Computer Science and Information Engineering, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, Taiwan, R.O.C.
Venue:
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Year:
2007

Citing 7
Cited 0

Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
An information retrieval model based on vector space method by supervised learning

Information Processing and Management: an International Journal
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Document Classification Approach By GA Feature Extraction Based Corner Classification Neural Network

CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Feature and Prototype Evolution for Nearest Neighbor Classification of Web Documents

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main steps for designing an automatic document classification system include feature extraction and classification. In this paper a method to improve feature extraction is proposed. In this method, genetic algorithm (GA) was applied to determine the threshold values of four criteria for extracting the representative keywords for each class. The purpose of these four threshold values is to extract as few representative keywords as possible. This keyword extraction method was combined with two classification algorithms, vector space model (VSM) and support vector machine (SVM), for examining the performance of the proposed classification system under various extracting conditions.