Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
The freshness of web search engine databases
Journal of Information Science
Web dynamics and their ramifications for the development of web search engines
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
A three-year study on the freshness of web search engine databases
Journal of Information Science
An empirical study on the change of web pages
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Hi-index | 0.00 |
In this paper, we describe a new web sampling scheme for measuring the evolution of freshness in search engines. The methodology used is the capture-recapture, which is mainly applied for estimating evolution rates in wildlife biological studies. After modifications and amendments, necessary for web paradigm application, we conducted three capture-recapture experiments of different duration over the caches of Google and MSN. In parallel, we used a typical sampling scheme, similar to many other web sampling approaches used in the literature, to evaluate the robustness of our proposal. The paper provides the implementation details of a web-based capture-recapture model along with its assessment. The results show that through the capture-recapture methodology we are able not only to measure the freshness of the tested search services but also to monitor its evolution over time, with a substantially lower amount of required sampling instances. It was not our intention to compare the performance of Google and MSN. However, through our experiments, we observed that although one sometimes presents better refresh rates than the other, in general both search services have virtually equal capabilities in refreshing their directories and providing new and up-to-date results to their users.