Web page classification: a soft computing approach

  • Authors:
  • Angela Ribeiro;Víctor Fresno;María C. Garcia-Alegre;Domingo Guinea

  • Affiliations:
  • Industrial Automation Institute, Spanish Council for Scientific Research, Madrid, Spain;Escuela Superior de Ciencia y Tecnología, Universidad Rey Juan Carlos;Industrial Automation Institute, Spanish Council for Scientific Research, Madrid, Spain;Industrial Automation Institute, Spanish Council for Scientific Research, Madrid, Spain

  • Venue:
  • AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Internet makes it possible to share and manipulate a vast quantity of information efficiently and effectively, but the rapid and chaotic growth experienced by the Net has generated a poorly organized environment that hinders the sharing and mining of useful data. The need for meaningful web-page classification techniques is therefore becoming an urgent issue. This paper describes a novel approach to web-page classification based on a fuzzy representation of web pages. A doublet representation that associates a weight with each of the most representative words of the web document so as to characterize its relevance in the document. This weight is derived by taking advantage of the characteristics of HTML language. Then a fuzzy-rule-based classifier is generated from a supervised learning process that uses a genetic algorithm to search for the minimum fuzzy-rule set that best covers the training examples. The proposed system has been demonstrated with two significantly different classes of web pages.