Keeping Up with the Changing Web

  • Authors:
  • Brian E. Brewington;George Cybenko

  • Affiliations:
  • -;-

  • Venue:
  • Computer
  • Year:
  • 2000

Quantified Score

Hi-index 4.10

Visualization

Abstract

Because information depreciates over time, keeping Web pages current presents new design challenges. This article quantifies what "current" means for Web search engines and estimates how often they must reindex the Web to keep current with its changing pages and structure.Most information--from a newspaper story to a temperature sensor measurement to a Web page--is dynamic. When monitoring an information source, when do our previous observations become stale and need refreshing? How can we schedule these refresh operations to satisfy a required level of currency without violating resource constraints--such as band-width or computing limitations on how much data can be observed in a given time?The authors investigate the trade-offs involved in monitoring dynamic information sources and discuss the Web in detail, estimating how fast documents change and exploring what constitutes a "current" Web index. For a simple class of Web-monitoring systems--search engines--they combine their idea of currency with actual measured data to estimate revisit rates.