The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Digging for Gold on the Web: Experience with the WebGather
HPC '00 Proceedings of the The Fourth International Conference on High-Performance Computing in the Asia-Pacific Region-Volume 2 - Volume 2
The Evolution of Link-Attributes for Pages and Its Implications on Web Crawling
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
On the peninsula phenomenon in web graph and its implications on web search
Computer Networks: The International Journal of Computer and Telecommunications Networking
The Viúva Negra crawler: an experience report
Software—Practice & Experience
A full distributed web crawler based on structured network
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Hi-index | 0.00 |
This paper presents an architectural design and evaluation result of an efficient Web-crawling system. The design involves a fully distributed architecture, a URL allocating algorithm, and a method to assure system scalability and dynamic reconfigurability. Simulation experiment shows that load balance, scalability and efficiency can be achieved in the system. Currently this distributed Web-crawling subsystem has been successfully integrated with WebGather, a well-known Chinese and English Web search engine, aimed at collecting all the Web pages in China and keeping pace with the rapid growth of Chinese Web information. In addition, we believe that the design can also be useful in other context such as digital library, etc.