GA based optimal keyword extraction in an automatic chinese web document classification system

  • Authors:
  • Chih-Hsun Chou;Chin-Chuan Han;Ya-Hui Chen

  • Affiliations:
  • Department of Computer Science and Information Engineering, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, Taiwan, R.O.C.

  • Venue:
  • ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main steps for designing an automatic document classification system include feature extraction and classification. In this paper a method to improve feature extraction is proposed. In this method, genetic algorithm (GA) was applied to determine the threshold values of four criteria for extracting the representative keywords for each class. The purpose of these four threshold values is to extract as few representative keywords as possible. This keyword extraction method was combined with two classification algorithms, vector space model (VSM) and support vector machine (SVM), for examining the performance of the proposed classification system under various extracting conditions.