An enhanced Web page change detection approach based on limiting similarity computations to elements of same type

Authors:
Hassan Artail;Michel Abi-Aad
Affiliations:
Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon 1107 2020;Electrical and Computer Engineering, American University of Beirut, Beirut, Lebanon 1107 2020
Venue:
Journal of Intelligent Information Systems
Year:
2009

Citing 13
Cited 0

An investigation of documents from the World Wide Web

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Meaningful change detection in structured data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Generalization of the Kolmogorov-Smirnov test

Computational Statistics & Data Analysis
WebCQ-detecting and delivering information changes on the web

Proceedings of the ninth international conference on Information and knowledge management
Empirically validated web page design metrics

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The AT&T Internet Difference Engine: Tracking and viewing changes on the web

World Wide Web
An Automated Change Detection Algorithm for HTML Documents Based on Semantic Hierarchies

Proceedings of the 17th International Conference on Data Engineering
Statistical Analysis of Web Documents: A Proposal and a Case Study

DEXA '01 Proceedings of the 12th International Workshop on Database and Expert Systems Applications
An Internet Difference Engine and Its Applications

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Efficient and effective web change detection

Data & Knowledge Engineering
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
CX-DIFF: a change detection algorithm for XML content and change visualization for WebVigiL

Data & Knowledge Engineering - Special issue: XML schema and data management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an efficient Web page detection approach based on restricting the similarity computations between two versions of a given Web page to the nodes with the same HTML tag type. Before performing the similarity computations, the HTML Web page is transformed into an XML-like structure in which a node corresponds to an open-closed HTML tag. Analytical expressions and supporting experimental results are used to quantify the improvements that are made when comparing the proposed approach to the traditional one, which computes the similarities across all nodes of both pages. It is shown that the improvements are highly dependent on the diversity of tags in the page. That is, the more diverse the page is (i.e., contains mixed content of text, images, links, etc.), the greater the improvements are, while the more uniform it is, the lesser they are.