Precision Evaluation of Search Engines

  • Authors:
  • Yi Shang;Longzhuang Li

  • Affiliations:
  • Department of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO 65211, USA yshang@parc.com;Department of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO 65211, USA ll059@mizzou.edu

  • Venue:
  • World Wide Web
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a general approach for statistically evaluating precision of search engines on the Web. Search engines are evaluated in two steps based on a large number of sample queries: (a) computing relevance scores of hits from each search engine, and (b) ranking the search engines based on statistical comparison of the relevance scores. In computing relevance scores of hits, we study four relevance scoring algorithms. Three of them are variations of algorithms widely used in the traditional information retrieval field. They are cover density ranking, Okapi similarity measurement, and vector space model algorithms. In addition, we develop a new three-level scoring algorithm to mimic commonly used manual approaches. In ranking the search engines in terms of precision, we apply a statistical metric called probability of win. In our experiments, six popular search engines, AltaVista, Fast, Google, Go, iWon, and NorthernLight, were evaluated based on queries from two domains of interest: parallel and distributed processing, and knowledge and data engineering. The first query set contains 1726 queries collected from the index terms of papers published in the IEEE Transactions on Knowledge and Data Engineering. The second set contains 1383 queries collected from the index terms of papers published in the IEEE Transactions on Parallel and Distributed Systems. Search engines were queried and compared in two different search modes: the default search mode and the exact phrase search mode. Our experimental results show that these six search engines performed differently under different search modes and scoring methods. Overall, Google was the best. NorthernLight was mostly second in the default search mode, whereas iWon was mostly second in the exact phrase search mode.