Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
An adaptive model for optimizing performance of an incremental web crawler
Proceedings of the 10th international conference on World Wide Web
The Laws of the Web: Patterns in the Ecology of Information
The Laws of the Web: Patterns in the Ecology of Information
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
ACM Computing Surveys (CSUR)
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
What's really new on the web?: identifying new pages from a series of unstable web snapshots
Proceedings of the 15th international conference on World Wide Web
Rate of change and other metrics: a live study of the world wide web
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Implementation of a web robot and statistics on the Korean web
HSI'03 Proceedings of the 2nd international conference on Human.society@internet
A new aggregation policy for RSS services
Proceedings of the 2008 international workshop on Context enabled source and service selection, integration and adaptation: organized with the 17th International World Wide Web Conference (WWW 2008)
The SHARC framework for data quality in Web archiving
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can make web database administrators avoid unnecessarily requesting undownloadable and unmodified web pages in a page group. We postulate that the change behavior of web pages is strongly related to the past change behavior. We gather the change histories of approximately three million web pages at two-day intervals for 100 days, and estimate the future change behavior of those pages. Our estimation, which was evaluated by actual change behavior of the pages, worked well.