Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
Search engine coverage bias: evidence and possible causes
Information Processing and Management: an International Journal
A General Evaluation Framework for Topical Crawlers
Information Retrieval
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Learning to crawl: Comparing classification schemes
ACM Transactions on Information Systems (TOIS)
The freshness of web search engine databases
Journal of Information Science
What's really new on the web?: identifying new pages from a series of unstable web snapshots
Proceedings of the 15th international conference on World Wide Web
Web dynamics and their ramifications for the development of web search engines
Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
An investigation of web crawler behavior: characterization and metrics
Computer Communications
An empirical study on the change of web pages
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
What users see - Structures in search engine results pages
Information Sciences: an International Journal
Journal of Information Science
A capture-recapture sampling standardization for improving Internet meta-search
Computer Standards & Interfaces
Computer Networks: The International Journal of Computer and Telecommunications Networking
Information Sciences: an International Journal
Web search solved?: all result rankings the same?
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A novel crawling algorithm for web pages
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
To what problem is distributed information retrieval the solution?
Journal of the American Society for Information Science and Technology
Slash-based relevance propagation model for topic distillation
Journal of Web Engineering
Hi-index | 0.00 |
This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages. We used data from a time span of six weeks in the years 2005, 2006 and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another.