Finding potential seeds through rank aggregation of web searches

  • Authors:
  • Rajendra Prasath;Pinar Oztürk

  • Affiliations:
  • Department of Computer and Information Science (IDI), Norwegian University of Science and Technology (NTNU), Trondheim, Norway;Department of Computer and Information Science (IDI), Norwegian University of Science and Technology (NTNU), Trondheim, Norway

  • Venue:
  • PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a potential seed selection algorithm for web crawlers using a gain - share scoring approach. Initially we consider a set of arbitrarily chosen tourism queries. Each query is given to the selected N commercial Search Engines (SEs); top msearch results for each SE are obtained, and each of these mresults is manually evaluated and assigned a relevance score. For each of m results, a gain - share score is computed using their hyperlinks structure across N ranked lists. Gain score of each link present in each of m results and a portion of the gain score is propagated to the share score of each of m results. This updated share scores of each of m results determine the potential set of seed URLs for web crawling. Experimental results on tourism related web data illustrate the effectiveness of the proposed seed selection algorithm.