Estimating evolution of freshness in Internet cache directories under the capture-recapture methodology

Authors:
Ioannis Anagnostopoulos;Christos Anagnostopoulos;Dimitrios D. Vergados
Affiliations:
Department of Information and Communications Systems Engineering, University of the Aegean, Karlovassi, 83200 Samos, Greece;Department of Cultural Technology and Communication, University of the Aegean, Mytilene, 81100 Lesvos, Greece;Department of Informatics, University of Piraeus, 80 Karaoli and Dimitriou St., GR-185 34 Piraeus, Greece
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2010

Citing 11
Cited 0

Synchronizing a database to improve freshness

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
How dynamic is the Web?

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Web page change and persistence---a four-year longitudinal study

Journal of the American Society for Information Science and Technology
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages

Software—Practice & Experience - Special issue: Web technologies
The freshness of web search engine databases

Journal of Information Science
Web dynamics and their ramifications for the development of web search engines

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Rate of change and other metrics: a live study of the world wide web

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
A three-year study on the freshness of web search engine databases

Journal of Information Science
An empirical study on the change of web pages

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a new web sampling scheme for measuring the evolution of freshness in search engines. The methodology used is the capture-recapture, which is mainly applied for estimating evolution rates in wildlife biological studies. After modifications and amendments, necessary for web paradigm application, we conducted three capture-recapture experiments of different duration over the caches of Google and MSN. In parallel, we used a typical sampling scheme, similar to many other web sampling approaches used in the literature, to evaluate the robustness of our proposal. The paper provides the implementation details of a web-based capture-recapture model along with its assessment. The results show that through the capture-recapture methodology we are able not only to measure the freshness of the tested search services but also to monitor its evolution over time, with a substantially lower amount of required sampling instances. It was not our intention to compare the performance of Google and MSN. However, through our experiments, we observed that although one sometimes presents better refresh rates than the other, in general both search services have virtually equal capabilities in refreshing their directories and providing new and up-to-date results to their users.