PIDALION: a reconfigurable agent-based multimedia search engine platform
MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Development of an intelligent distributed news retrieval system
International Journal of Knowledge-based and Intelligent Engineering Systems
Hi-index | 0.00 |
With the enormous growth of the World Wide Web in recentyears, the issue of how to discover web pages efficientlyhas become an important challenge for web crawler designers.In this paper, we will outline a simple model to predictthe distribution of the search depth in a breadth-first searchto reach the first web pages relevant to a user query. Wedefine this probability as the crawler confidence. Recentstudies indicate that at a large scale the Web structure subscribesto power law distribution on several aspects [3][7].However, our work tries to model a microscopic linkagestructure of the Web from an intelligent crawler's point ofview. With the information provided by crawler confidence,an intelligent crawler can adjust its crawling behavior toachieve a higher harvest rate.