The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SPHINX: a framework for creating personal, site-specific Web crawlers
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Mercator: A scalable, extensible Web crawler
World Wide Web
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Parallel web spiders for cooperative information gathering
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Hi-index | 0.00 |
With the rapid information growth on the Internet, web information collection is becoming increasingly important in many web applications, especially in search engines. The performance of web information collectors has a great influence on the quality of search engines, so when it comes to web spiders, we usually focus on their speed and accuracy. In this paper, we point out that customizability is also an important feature of a well-designed spider, which means spiders should be able to provide multi-modal services to satisfy different users with different requirements and preferences. And we have developed a parallel web spider system based on multi-agent techniques. It runs with high speed and high accuracy, and what’s the most important, it can provide its services in multiple perspectives and has good extensibility and personalized customizability.