The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Internet Technology (TOIT)
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Mining the Web's Link Structure
Computer
An Alternate Way to Rank Hyper-linked Web-Pages
ICIT '06 Proceedings of the 9th International Conference on Information Technology
FlexiRank: an algorithm offering flexibility and accuracy for ranking the web pages
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Hi-index | 0.00 |
An important component of any web search engine is its crawler, which is also known as robot or spider. An efficient set of crawlers make any search engine more powerful, apart from its other measures of performance, such as its ranking algorithm, storage mechanism, indexing techniques, etc. In this paper, we have proposed an extended technique for crawling over the World Wide Web (WWW) on behalf of a search engine. This is an approach with multiple crawlers working in parallel combined with the mechanism of focused crawling (Chakrabarti et al., 1999a, 2002; Mukhopadhyay et al., 2006). In this approach, the total structure of any website is divided into several number of levels based on the hyperlink-structure for downloading web pages from that website (Chakrabarti et al., 1999b; Mukhopadhyay and Singh, 2004). The number of crawlers of each level is not fixed, rather dynamic in this context. It is determined at execution time on demand basis using threaded program based on the number of hyperlinks of a specific web page. This paper also proposes a focused hierarchical crawling technique, where crawlers are created dynamically at runtime for different domains to crawl the web pages with the essence of resource sharing.