Graduated locality-aware scheduling for search engines

  • Authors:
  • Jaimie Kelley;Christopher Stewart

  • Affiliations:
  • -;-

  • Venue:
  • Proceedings of the Posters and Demo Track
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Search engines parse diverse, natural language datasets in search of answers to a user query. Not only are they expected to find good answers, they must find them quickly. For public search engines, like Bing and Google, answers that are returned slowly cost more and produce less revenue than answers returned within a second [3]. For private search engines like IBM's Watson [2], slow answers bound the number of queries that can be processed over the lifetime of the hardware. To meet these response time demands, search engines scale out, i.e., they divide datasets across large server clusters and execute user queries in parallel across the cluster. An open research challenge is to reduce the number of scale-out servers needed to ensure fast response times. Recent research reduces scale out by partially executing queries, checking intermediate results, and completing the query as soon as the results exceed a quality threshold [1]. Such partial query execution exploits redundancy within datasets; Often good answers can be found without parsing the entire dataset. IBM's Watson used partial query execution. It buzzed in during Jeopardy games only when intermediate answers exceeded a quality threshold [2]. As another example, Bing used fewer servers by executing queries only until intermediate results reached diminishing returns [1]. Along with measuring the quality of intermediate results, partial query execution requires a mechanism to explore the search dataset iteratively. This poster paper describes a new approach to iteratively explore search datasets.