Automatic web pages categorization with ReliefF and Hidden Naive Bayes

  • Authors:
  • Xin Jin;Rongyan Li;Xian Shen;Rongfang Bie

  • Affiliations:
  • Beijing Normal University, Beijing, China;Beijing Normal University, Beijing, China;Beijing Normal University, Beijing, China;Beijing Normal University, Beijing, China

  • Venue:
  • Proceedings of the 2007 ACM symposium on Applied computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A great challenge of web mining arises from the increasingly large web pages and the high dimensionality associated with natural language. Since classifying web pages of an interesting class is often the first step of mining the web, web page categorization/classification is one of the essential techniques for web mining. One of the main challenges of web page classification is the high dimensional text vocabulary space. In this research, we propose a Hidden Naive Bayes based method for web page classification. We also propose to use the ReliefF feature selection method for selecting relevant words to improve the classification performance. Comparisons with traditional techniques are provided. Results on benchmark dataset show that the proposed methods are promising for accurate web page classification.