An enhanced Web page change detection approach based on limiting similarity computations to elements of same type

  • Authors:
  • Hassan Artail;Michel Abi-Aad

  • Affiliations:
  • Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon 1107 2020;Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon 1107 2020

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an efficient Web page detection approach based on restricting the similarity computations between two versions of a given Web page to the nodes with the same HTML tag type. Before performing the similarity computations, the HTML Web page is transformed into an XML-like structure in which a node corresponds to an open-closed HTML tag. Analytical expressions and supporting experimental results are used to quantify the improvements that are made when comparing the proposed approach to the traditional one, which computes the similarities across all nodes of both pages. It is shown that the improvements are highly dependent on the diversity of tags in the page. That is, the more diverse the page is (i.e., contains mixed content of text, images, links, etc.), the greater the improvements are, while the more uniform it is, the lesser they are.