A dataflow approach to efficient change detection of HTML/XML documents in WebVigiL

  • Authors:
  • Anoop Sanka;Shravan Chamakura;Sharma Chakravarthy

  • Affiliations:
  • Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX;Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX;Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX

  • Venue:
  • Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The burgeoning data on the Web makes it difficult for one to keep track of the changes that constantly occur to specific information of interest. Currently, the most widespread way of detecting changes occurring to Web content is to periodically retrieve the pages of interest and check them for changes. This approach puts the burden on the user and wastes time and resources. Alternatively, systems that detect any change to a page is an overkill as it presents information that may not be relevant. Timeliness of change detection is also an issue in this approach. In this paper, we present a change-monitoring system--WebVigiL--which efficiently monitors user-specified Web pages for customized changes and notifies the user in a timely manner. The focus of this paper is on the dataflow approach used for detecting multiple types of changes to a page and to monitor changes to more then one page at a time. This approach has been optimized to group similar/same specifications to reduce the computation of changes. Multiple changes to the same page as well as to different pages are handled in our approach. As a special case, this includes the monitoring of Web pages containing frames. We also provide the overall architecture and functionality of the WebVigiL system to highlight the role of change detection graph (CDG) which forms the core of the system.