Modelling information persistence on the web

Authors:
Daniel Gomes;Mário J. Silva
Affiliations:
Universidade de Lisboa, Faculdade de Ciências, Portugal;Universidade de Lisboa, Faculdade de Ciências, Portugal
Venue:
ICWE '06 Proceedings of the 6th international conference on Web engineering
Year:
2006

Citing 14
Cited 11

Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
How dynamic is the Web?

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Persistence of information on the web: analyzing citations contained in research articles

Proceedings of the ninth international conference on Information and knowledge management
Web page change and persistence---a four-year longitudinal study

Journal of the American Society for Information Science and Technology
Mercator: A scalable, extensible Web crawler

World Wide Web
The decay and failures of web references

Communications of the ACM
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Estimating frequency of change

ACM Transactions on Internet Technology (TOIT)
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Characterization of a large web site population with implications for content delivery

Proceedings of the 13th international conference on World Wide Web
Characterizing a national community web

ACM Transactions on Internet Technology (TOIT)
Managing duplicates in a web archive

Proceedings of the 2006 ACM symposium on Applied computing
Rate of change and other metrics: a live study of the world wide web

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems

Can social bookmarking enhance search in the web?

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Using neighbors to date web documents

Proceedings of the 9th annual ACM international workshop on Web information and data management
A method for measuring the evolution of a topic on the Web: The case of “informetrics”

Journal of the American Society for Information Science and Technology
Towards improving web search by utilizing social bookmarks

ICWE'07 Proceedings of the 7th international conference on Web engineering
Incremental web-site boundary detection using random walks

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Scalable manipulation of archival web graphs

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
As time goes by: discovering eras in evolving social networks

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Exploring temporal evidence in web information retrieval

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Creating a billion-scale searchable web archive

Proceedings of the 22nd international conference on World Wide Web companion
A modelling framework for social media monitoring

International Journal of Web Engineering and Technology
Evolving networks: Eras and turning points

Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Models of web data persistency are essential tools for the designof efficient information extraction systems that repeatedlycollect and process the data. This study models the persistence ofweb data through the measurement of URL and content persistenceacross several snapshots of a national community web, collectedfor 3 years. We found that the lifetimes of URLs and contents aremodelled by logarithmic functions. We gathered statistics on thestructure of the web, identified reasons for URL death andcharacterized persistent URLs and contents. The lasting contentstend to be referenced by different URLs during their lifetime,while half of the contents referenced by persistent URLs do notchange.