A fuzzy system for the web page representation

  • Authors:
  • Angela Ribeiro;Victor Fresno;María C. Garcia-Alegre;Domingo Guinea

  • Affiliations:
  • Industrial Automation Institute, Spanish Council for Scientific Research. 28500 Arganda del Rey. Madrid. Spain;Industrial Automation Institute, Spanish Council for Scientific Research. 28500 Arganda del Rey. Madrid. Spain;Industrial Automation Institute, Spanish Council for Scientific Research. 28500 Arganda del Rey. Madrid. Spain;Industrial Automation Institute, Spanish Council for Scientific Research. 28500 Arganda del Rey. Madrid. Spain

  • Venue:
  • Intelligent exploration of the web
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the issue of an adequate representation of a web page, to perform further on classification and data mining. The approach focuses the textual part of web pages, which are represented by a two-dimension vector. The vector components are sorted by the relevance of each word in the text. Two approaches, analytical and fuzzy, that take advantage of characteristics of the HTML language are presented to compute the word relevance. Both models are contrasted in learning and classification tasks, to evaluate the suitability of each approach. The experiments show an obvious improvement of fuzzy method versus analytical one. The analytical and fuzzy approaches here presented are general, in the sense that every characteristic of the web pages could be easily integrated without additional cost.