A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization strategies for complex queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Online power-performance adaptation of multithreaded programs using hardware event-based prediction
Proceedings of the 20th annual international conference on Supercomputing
Expressing and exploiting concurrency in networked applications with aspen
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Provably Efficient Online Nonclairvoyant Adaptive Scheduling
IEEE Transactions on Parallel and Distributed Systems
The cost of a cloud: research problems in data center networks
ACM SIGCOMM Computer Communication Review
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using graphics processors for high performance IR query processing
Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines
Proceedings of the 18th international conference on World wide web
Improving the performance of list intersection
Proceedings of the VLDB Endowment
Communications of the ACM
A refreshing perspective of search engine caching
Proceedings of the 19th international conference on World wide web
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Power management of online data-intensive services
Proceedings of the 38th annual international symposium on Computer architecture
Posting list intersection on multicore architectures
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Better never than late: meeting deadlines in datacenter networks
Proceedings of the ACM SIGCOMM 2011 conference
Deadline-aware datacenter tcp (D2TCP)
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Prefetching query results and its impact on search engines
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Thread reinforcer: Dynamically determining number of threads via OS level monitoring
IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Hi-index | 0.00 |
A web search query made to Microsoft Bing is currently parallelized by distributing the query processing across many servers. Within each of these servers, the query is, however, processed sequentially. Although each server may be processing multiple queries concurrently, with modern multicore servers, parallelizing the processing of an individual query within the server may nonetheless improve the user's experience by reducing the response time. In this paper, we describe the issues that make the parallelization of an individual query within a server challenging, and we present a parallelization approach that effectively addresses these challenges. Since each server may be processing multiple queries concurrently, we also present a adaptive resource management algorithm that chooses the degree of parallelism at run-time for each query, taking into account system load and parallelization efficiency. As a result, the servers now execute queries with a high degree of parallelism at low loads, gracefully reduce the degree of parallelism with increased load, and choose sequential execution under high load. We have implemented our parallelization approach and adaptive resource management algorithm in Bing servers and evaluated them experimentally with production workloads. The experimental results show that the mean and 95th-percentile response times for queries are reduced by more than 50% under light or moderate load. Moreover, under high load where parallelization adversely degrades the system performance, the response times are kept the same as when queries are executed sequentially. In all cases, we observe no degradation in the relevance of the search results.