The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Rank aggregation methods for the Web
Proceedings of the 10th international conference on World Wide Web
A bootstrapping approach to named entity classification using successive learners
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Counter-training in discovery of semantic patterns
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Using similarity links as shortcuts to relevant web pages
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Graph based crawler seed selection
Proceedings of the 18th international conference on World wide web
Hi-index | 0.00 |
This paper presents a potential seed selection algorithm for web crawlers using a gain - share scoring approach. Initially we consider a set of arbitrarily chosen tourism queries. Each query is given to the selected N commercial Search Engines (SEs); top msearch results for each SE are obtained, and each of these mresults is manually evaluated and assigned a relevance score. For each of m results, a gain - share score is computed using their hyperlinks structure across N ranked lists. Gain score of each link present in each of m results and a portion of the gain score is propagated to the share score of each of m results. This updated share scores of each of m results determine the potential set of seed URLs for web crawling. Experimental results on tourism related web data illustrate the effectiveness of the proposed seed selection algorithm.