Keeping Up with the Changing Web

Authors:
Brian E. Brewington;George Cybenko
Affiliations:
-;-
Venue:
Computer
Year:
2000

Citing 2
Cited 42

Accessibility of information on the Web

intelligence
Rate of change and other metrics: a live study of the world wide web

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

An adaptive model for optimizing performance of an incremental web crawler

Proceedings of the 10th international conference on World Wide Web
Systems support for scalable data mining

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Web page change and persistence---a four-year longitudinal study

Journal of the American Society for Information Science and Technology
Characterizing Web Document Change

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Crawlets: Agents for High Performance Web Search Engines

MA '01 Proceedings of the 5th International Conference on Mobile Agents
Cataloging and metadata issues for electronic resources

Building a virtual library
Dynamic maintenance of web indexes using landmarks

WWW '03 Proceedings of the 12th international conference on World Wide Web
On the bursty evolution of blogspace

WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change

ACM Transactions on Internet Technology (TOIT)
Effective page refresh policies for Web crawlers

ACM Transactions on Database Systems (TODS)
Predictive Prefetching on the Web and Its Potential Impact in the Wide Area

World Wide Web
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Supporting metasearch with XSL

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
A Weighted Freshness Metric for Maintaining Search Engine Local Repository

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Evolution, continuity, and disappearance of documents on a specific topic on the web: a longitudinal study of "Informetrics"

Journal of the American Society for Information Science and Technology
On the Bursty Evolution of Blogspace

World Wide Web
Modeling and Managing Content Changes in Text Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Lexical and semantic clustering by web links

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Trust and accountability issues in scalable invalidation-based web cache consistency

ACM SIGOPS Operating Systems Review
Web dynamics and their ramifications for the development of web search engines

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Efficient Update of Indexes for Dynamically Changing Web Documents

World Wide Web
Modeling and managing changes in text databases

ACM Transactions on Database Systems (TODS)
Designing clustering-based web crawling policies for search engine crawlers

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
S2S: structural-to-syntactic matching similar documents

Knowledge and Information Systems
Characterization of the evolution of a news Web site

Journal of Systems and Software
Data quality in web archiving

Proceedings of the 3rd workshop on Information credibility on the web
Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A method for measuring the evolution of a topic on the Web: The case of “informetrics”

Journal of the American Society for Information Science and Technology
Using Knowledge Base for Event-Driven Scheduling of Web Monitoring Systems

EC-Web 2009 Proceedings of the 10th International Conference on E-Commerce and Web Technologies
The gardener's problem for web information monitoring

Proceedings of the 18th ACM conference on Information and knowledge management
SHARC: framework for quality-conscious web archiving

Proceedings of the VLDB Endowment
Efficiently detecting webpage updates using samples

ICWE'07 Proceedings of the 7th international conference on Web engineering
Scalable techniques for document identifier assignment in inverted indexes

Proceedings of the 19th international conference on World wide web
Coverage and timeliness analysis of search engines with webpage monitoring results

WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Clustering-based incremental web crawling

ACM Transactions on Information Systems (TOIS)
The SHARC framework for data quality in Web archiving

The VLDB Journal — The International Journal on Very Large Data Bases
Archiving the web using page changes patterns: a case study

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Improving the quality of web archives through the importance of changes

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
FORMS: Unifying reference model for formal specification of distributed self-adaptive systems

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special section on formal methods in pervasive computing, pervasive adaptation, and self-adaptive systems: Models and algorithms
As time goes by: discovering eras in evolving social networks

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
An evaluation of caching policies for memento timemaps

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Evolving networks: Eras and turning points

Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery

Quantified Score

Hi-index	4.10

Visualization

Abstract

Because information depreciates over time, keeping Web pages current presents new design challenges. This article quantifies what "current" means for Web search engines and estimates how often they must reindex the Web to keep current with its changing pages and structure.Most information--from a newspaper story to a temperature sensor measurement to a Web page--is dynamic. When monitoring an information source, when do our previous observations become stale and need refreshing? How can we schedule these refresh operations to satisfy a required level of currency without violating resource constraints--such as band-width or computing limitations on how much data can be observed in a given time?The authors investigate the trade-offs involved in monitoring dynamic information sources and discuss the Web in detail, estimating how fast documents change and exploring what constitutes a "current" Web index. For a simple class of Web-monitoring systems--search engines--they combine their idea of currency with actual measured data to estimate revisit rates.