Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Architectural design and evaluation of an efficient web-crawling system
Journal of Systems and Software
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Using PageRank to Characterize Web Structure
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The web as a graph: measurements, models, and methods
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
On the peninsula phenomenon in web graph and its implications on web search
Computer Networks: The International Journal of Computer and Telecommunications Networking
Hi-index | 0.00 |
It is important for an incremental crawler to know how web pages evolve and the relation between their changing frequencies and the link-attributes such as indegrees. This paper proposes a model for incremental crawling and performs an experiment to verify the correlation between them, by monitoring the evolution of all the link-attributes of the web pages within one website. Particularly, we look deeply into one special kind of page named Index-pages. From the experiment, we can make four conclusions: (1) Pages which have bigger indegrees, outdegrees or PageRank values change more often, and these link-attributes all approximately obey a power-law distribution. (2) The link-attributes of pages seldom change though the pages change themselves. (3) A small proportion of the pages link to most of the vertexes in the web graph. (4) The Index-pages link to sizeable new pages in a website. These conclusions can be used to greatly enhance the performance of an incremental crawler, which is the foremost component for general search engines and web information stores.