A New Approach of Feature Selection for Chinese Web Page Categorization

  • Authors:
  • Cunhe Li;Lina Zhu;Kangwei Liu

  • Affiliations:
  • School of Computer & Communication Engineering, China University of Petroleum, Email: jelly_3@163.com, Dongying, China 257061;School of Computer & Communication Engineering, China University of Petroleum, Email: jelly_3@163.com, Dongying, China 257061;School of Computer & Communication Engineering, China University of Petroleum, Email: jelly_3@163.com, Dongying, China 257061

  • Venue:
  • ISICA '08 Proceedings of the 3rd International Symposium on Advances in Computation and Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection is a key step of web page categorization. It can influence the accuracy of categorization directly as well as the efficiency. This paper proposes a new approach of feature selection based on Mutual Information algorithm. It brings in feature whose Mutual Information is negative and emphasizes the occurrence probabilities of features in different categories. Moreover, it makes some improvements on the web page preprocessing to reserve some useful features. The experiment shows that the new feature selection method improves the accuracy of categorization effectively.