Sic transit gloria telae: towards an understanding of the web's decay

Authors:
Ziv Bar-Yossef;Andrei Z. Broder;Ravi Kumar;Andrew Tomkins
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM T. J. Watson Research Center, Hawthorne, NY;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
Proceedings of the 13th international conference on World Wide Web
Year:
2004

Citing 19
Cited 32

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An analysis of Web page and Web site constancy and permanence

Journal of the American Society for Information Science
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
How dynamic is the Web?

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
On near-uniform URL sampling

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Optimal crawling strategies for web search engines

Proceedings of the 11th international conference on World Wide Web
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximating Aggregate Queries about Web Pages via Random Walks

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Using PageRank to Characterize Web Structure

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Stochastic models for the Web graph

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Rate of change and other metrics: a live study of the world wide web

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
The web beyond popularity: a really simple system for web scale RSS

Proceedings of the 15th international conference on World Wide Web
What's really new on the web?: identifying new pages from a series of unstable web snapshots

Proceedings of the 15th international conference on World Wide Web
BuzzRank … and the trend is your friend

Proceedings of the 15th international conference on World Wide Web
Dynamic test collections: measuring search effectiveness on the live web

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamics of the Chilean web structure

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Evaluation of crawling policies for a web-repository crawler

Proceedings of the seventeenth conference on Hypertext and hypermedia
Preferential deletion in dynamic models of web-like networks

Information Processing Letters
Characterization of national Web domains

ACM Transactions on Internet Technology (TOIT)
Factors affecting website reconstruction from the web infrastructure

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

IEEE Transactions on Knowledge and Data Engineering
Using neighbors to date web documents

Proceedings of the 9th annual ACM international workshop on Web information and data management
Detecting age of page content

Proceedings of the 9th annual ACM international workshop on Web information and data management
Recrawl scheduling based on information longevity

Proceedings of the 17th international conference on World Wide Web
Detecting soft errors by redirection classification

Proceedings of the 18th international conference on World wide web
Web spam filtering in internet archives

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Bringing your dead links back to life: a comprehensive approach and lessons learned

Proceedings of the 20th ACM conference on Hypertext and hypermedia
The impact of crawl policy on web search effectiveness

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A method for measuring the evolution of a topic on the Web: The case of “informetrics”

Journal of the American Society for Information Science and Technology
Vetting the links of the web

Proceedings of the 18th ACM conference on Information and knowledge management
Stochastic models for tabbed browsing

Proceedings of the 19th international conference on World wide web
A pocket guide to web history

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Freshness matters: in flowers, food, and web authority

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Building a dynamic classifier for large text data collections

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Calculating content recency based on timestamped and non-timestamped sources for supporting page quality estimation

Proceedings of the 2011 ACM Symposium on Applied Computing
Index design and query processing for graph conductance search

The VLDB Journal — The International Journal on Very Large Data Bases
Rediscovering missing web pages using link neighborhood lexical signatures

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Towards real intelligent web exploration

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Losing my revolution: how many resources shared on social media have been lost?

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Identifying "soft 404" error pages: analyzing the lexical signatures of documents in distributed collections

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Reading the correct history?: modeling temporal intention in resource sharing

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
First steps in archiving the mobile web: automated discovery of mobile websites

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it efficiently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.