Rules revisited: web page classification

  • Authors:
  • Aristotelis Katsaris;Isambo Karali

  • Affiliations:
  • University of Athens, Athens, Greece;University of Athens, Athens, Greece

  • Venue:
  • CI '07 Proceedings of the Third IASTED International Conference on Computational Intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The importance of the problem of web page classification grows significantly with the continuous increase of the information available in the Internet. Web page classification serves two purposes: filtering the enormous search space on the Web by considering only relevant pages when attempting to locate a specific kind of information, providing some semantic information when trying to access high precision results. To classify a Web page, its structure should be considered together with its text content. In this paper, we present our approach, which deals with the problem by using derivation rules and heuristics as well as analysis of the web page structure at a high semantic level. This approach was implemented in the ExpertCat system.