Web search using mobile cores: quantifying and mitigating the price of efficiency

Authors:
Vijay Janapa Reddi;Benjamin C. Lee;Trishul Chilimbi;Kushagra Vaid
Affiliations:
Harvard University, Cambridge, MA, USA;Stanford University, Palo Alto, CA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 12
Cited 35

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Enterprise IT Trends and Implications for Architecture Research

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The Price of Performance

Queue - Multiprocessors
Ensemble-level Power Management for Dense Blade Servers

Proceedings of the 33rd annual international symposium on Computer Architecture
The Case for Energy-Proportional Computing

Computer
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
FAWNdamentally power-efficient clusters

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Sun's big splash [Niagara microprocessor chip]

IEEE Spectrum

Optimizing the datacenter for data-centric workloads

Proceedings of the international conference on Supercomputing
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Power management of online data-intensive services

Proceedings of the 38th annual international symposium on Computer architecture
Mobile processors for energy-efficient web search

ACM Transactions on Computer Systems (TOCS)
Does low-power design imply energy efficiency for data centers?

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Small cache, big effect: provable load balancing for randomly partitioned cluster services

Proceedings of the 2nd ACM Symposium on Cloud Computing
High-efficiency server design

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification

ACM Transactions on Architecture and Code Optimization (TACO)
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Tarazu: optimizing MapReduce on heterogeneous clusters

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Robust heterogeneous data center design: a principled approach

ACM SIGMETRICS Performance Evaluation Review
The search for energy-efficient building blocks for the data center

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Studying hardware and software trade-offs for a real-life web 2.0 workload

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Workload analysis of a large-scale key-value store

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
Building a power-proportional software router

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Barely alive memory servers: Keeping data active in a low-power state

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Analyzing performance and power efficiency of network processing over 10 GbE

Journal of Parallel and Distributed Computing
Application-driven energy-efficient architecture explorations for big data

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems (TOCS)
Towards green data centers: A comparison of x86 and ARM architectures power efficiency

Journal of Parallel and Distributed Computing
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Adaptive parallelism for web search

Proceedings of the 8th ACM European Conference on Computer Systems
Resource efficient computing for warehouse-scale datacenters

Proceedings of the Conference on Design, Automation and Test in Europe
Thin servers with smart pipes: designing SoC accelerators for memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture
Whare-map: heterogeneity in "homogeneous" warehouse-scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Scale-up vs scale-out for Hadoop: time to rethink?

Proceedings of the 4th annual Symposium on Cloud Computing
Market mechanisms for managing datacenters with heterogeneous microarchitectures

ACM Transactions on Computer Systems (TOCS)
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Integrated 3D-stacked server designs for increasing physical density of key-value stores

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Rhythm: harnessing data parallel hardware for server workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The commoditization of hardware, data center economies of scale, and Internet-scale workload growth all demand greater power efficiency to sustain scalability. Traditional enterprise workloads, which are typically memory and I/O bound, have been well served by chip multiprocessors com- prising of small, power-efficient cores. Recent advances in mobile computing have led to modern small cores capable of delivering even better power efficiency. While these cores can deliver performance-per-Watt efficiency for data center workloads, small cores impact application quality-of-service robustness, and flexibility, as these workloads increasingly invoke computationally intensive kernels. These challenges constitute the price of efficiency. We quantify efficiency for an industry-strength online web search engine in production at both the microarchitecture- and system-level, evaluating search on server and mobile-class architectures using Xeon and Atom processors.