Text Learning and Hierarchical Feature Selection in Webpage Classification

  • Authors:
  • Xiaogang Peng;Zhong Ming;Haitao Wang

  • Affiliations:
  • College of Information Engineering(Software College), Shenzhen University, P.R. China 518060;College of Information Engineering(Software College), Shenzhen University, P.R. China 518060;College of Information Engineering(Software College), Shenzhen University, P.R. China 518060

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the solutions of retrieving information from the Internet is by classifying web pages automatically. In almost all classification methods that have been published, feature selection is a very important issue. Although there are many feature selection methods has been proposed. Most of them focus on the features within a category and ignore that the hierarchy of categories also plays an important role in achieving accurate classification results. This paper proposes a new feature selection method that incorporates hierarchical information, which prevents the classifying process from going through every node in the hierarchy. Our test results show that our classification algorithm using hierarchical information reduces the search complexity from n to log(n) and increases the accuracy by 6.2% comparing to a related algorithm.