Accessibility of information on the Web
intelligence
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
Systems support for scalable data mining
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
Characterizing Web Document Change
WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Crawlets: Agents for High Performance Web Search Engines
MA '01 Proceedings of the 5th International Conference on Mobile Agents
Cataloging and metadata issues for electronic resources
Building a virtual library
Dynamic maintenance of web indexes using landmarks
WWW '03 Proceedings of the 12th international conference on World Wide Web
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change
ACM Transactions on Internet Technology (TOIT)
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Supporting metasearch with XSL
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
A Weighted Freshness Metric for Maintaining Search Engine Local Repository
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Journal of the American Society for Information Science and Technology
On the Bursty Evolution of Blogspace
World Wide Web
Modeling and Managing Content Changes in Text Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Trust and accountability issues in scalable invalidation-based web cache consistency
ACM SIGOPS Operating Systems Review
Web dynamics and their ramifications for the development of web search engines
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Modeling and managing changes in text databases
ACM Transactions on Database Systems (TODS)
Designing clustering-based web crawling policies for search engine crawlers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
S2S: structural-to-syntactic matching similar documents
Knowledge and Information Systems
Characterization of the evolution of a news Web site
Journal of Systems and Software
Proceedings of the 3rd workshop on Information credibility on the web
Incorporating site-level knowledge for incremental crawling of web forums: a list-wise strategy
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A method for measuring the evolution of a topic on the Web: The case of “informetrics”
Journal of the American Society for Information Science and Technology
Using Knowledge Base for Event-Driven Scheduling of Web Monitoring Systems
EC-Web 2009 Proceedings of the 10th International Conference on E-Commerce and Web Technologies
The gardener's problem for web information monitoring
Proceedings of the 18th ACM conference on Information and knowledge management
SHARC: framework for quality-conscious web archiving
Proceedings of the VLDB Endowment
Efficiently detecting webpage updates using samples
ICWE'07 Proceedings of the 7th international conference on Web engineering
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Coverage and timeliness analysis of search engines with webpage monitoring results
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Clustering-based incremental web crawling
ACM Transactions on Information Systems (TOIS)
The SHARC framework for data quality in Web archiving
The VLDB Journal — The International Journal on Very Large Data Bases
Archiving the web using page changes patterns: a case study
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Improving the quality of web archives through the importance of changes
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
FORMS: Unifying reference model for formal specification of distributed self-adaptive systems
ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special section on formal methods in pervasive computing, pervasive adaptation, and self-adaptive systems: Models and algorithms
As time goes by: discovering eras in evolving social networks
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
An evaluation of caching policies for memento timemaps
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Evolving networks: Eras and turning points
Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery
Hi-index | 4.10 |
Because information depreciates over time, keeping Web pages current presents new design challenges. This article quantifies what "current" means for Web search engines and estimates how often they must reindex the Web to keep current with its changing pages and structure.Most information--from a newspaper story to a temperature sensor measurement to a Web page--is dynamic. When monitoring an information source, when do our previous observations become stale and need refreshing? How can we schedule these refresh operations to satisfy a required level of currency without violating resource constraints--such as band-width or computing limitations on how much data can be observed in a given time?The authors investigate the trade-offs involved in monitoring dynamic information sources and discuss the Web in detail, estimating how fast documents change and exploring what constitutes a "current" Web index. For a simple class of Web-monitoring systems--search engines--they combine their idea of currency with actual measured data to estimate revisit rates.