Adaptive parallelism for web search

Authors:
Myeongjae Jeon;Yuxiong He;Sameh Elnikety;Alan L. Cox;Scott Rixner
Affiliations:
Rice University Houston, TX;Microsoft Research Redmond, WA;Microsoft Research Redmond, WA;Rice University Houston, TX;Rice University Houston, TX
Venue:
Proceedings of the 8th ACM European Conference on Computer Systems
Year:
2013

Citing 33
Cited 0

A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Non-Clairvoyant Scheduling for Minimizing Mean Slowdown

Algorithmica
Adaptive execution techniques for SMT multiprocessor architectures

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization strategies for complex queries

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Online power-performance adaptation of multithreaded programs using hardware event-based prediction

Proceedings of the 20th annual international conference on Supercomputing
Expressing and exploiting concurrency in networked applications with aspen

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems

Parallel Computing
The Case for Energy-Proportional Computing

Computer
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Provably Efficient Online Nonclairvoyant Adaptive Scheduling

IEEE Transactions on Parallel and Distributed Systems
The cost of a cloud: research problems in data center networks

ACM SIGCOMM Computer Communication Review
Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using graphics processors for high performance IR query processing

Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Reducing Query Latencies in Web Search Using Fine-Grained Parallelism

World Wide Web
Improving the performance of list intersection

Proceedings of the VLDB Endowment
A view of cloud computing

Communications of the ACM
A refreshing perspective of search engine caching

Proceedings of the 19th international conference on World wide web
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

Proceedings of the 37th annual international symposium on Computer architecture
Web search using mobile cores: quantifying and mitigating the price of efficiency

Proceedings of the 37th annual international symposium on Computer architecture
Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Power management of online data-intensive services

Proceedings of the 38th annual international symposium on Computer architecture
Posting list intersection on multicore architectures

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Better never than late: meeting deadlines in datacenter networks

Proceedings of the ACM SIGCOMM 2011 conference
Deadline-aware datacenter tcp (D2TCP)

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Prefetching query results and its impact on search engines

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Thread reinforcer: Dynamically determining number of threads via OS level monitoring

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization

Quantified Score

Hi-index	0.00

Visualization

Abstract

A web search query made to Microsoft Bing is currently parallelized by distributing the query processing across many servers. Within each of these servers, the query is, however, processed sequentially. Although each server may be processing multiple queries concurrently, with modern multicore servers, parallelizing the processing of an individual query within the server may nonetheless improve the user's experience by reducing the response time. In this paper, we describe the issues that make the parallelization of an individual query within a server challenging, and we present a parallelization approach that effectively addresses these challenges. Since each server may be processing multiple queries concurrently, we also present a adaptive resource management algorithm that chooses the degree of parallelism at run-time for each query, taking into account system load and parallelization efficiency. As a result, the servers now execute queries with a high degree of parallelism at low loads, gracefully reduce the degree of parallelism with increased load, and choose sequential execution under high load. We have implemented our parallelization approach and adaptive resource management algorithm in Bing servers and evaluated them experimentally with production workloads. The experimental results show that the mean and 95th-percentile response times for queries are reduced by more than 50% under light or moderate load. Moreover, under high load where parallelization adversely degrades the system performance, the response times are kept the same as when queries are executed sequentially. In all cases, we observe no degradation in the relevance of the search results.