Extracting accurate and complete results from search engines: Case study windows live

  • Authors:
  • Mike Thelwall

  • Affiliations:
  • School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although designed for general Web searching, Webometrics and related research commercial search engines are also used to produce estimated hit counts or lists of URLs matching a query. Unfortunately, however, they do not return all matching URLs for a search and their hit count estimates are unreliable. In this article, we assess whether it is possible to obtain complete lists of matching URLs from Windows Live, and whether any of its hit count estimates are robust. As part of this, we introduce two new methods to extract extra URLs from search engines: automated query splitting and automated domain and TLD searching. Both methods successfully identify additional matching URLs but the findings suggest that there is no way to get complete lists of matching URLs or accurate hit counts from Windows Live, although some estimating suggestions are provided. © 2008 Wiley Periodicals, Inc.