Supporting MapReduce on large-scale asymmetric multi-core clusters

Authors:
M. Mustafa Rafique;Benjamin Rose;Ali R. Butt;Dimitrios S. Nikolopoulos
Affiliations:
Virginia Tech, Blacksburg, Virginia, USA;Virginia Tech, Blacksburg, Virginia, USA;Virginia Tech, Blacksburg, Virginia, USA;FORTH, Heraklion, Greece
Venue:
ACM SIGOPS Operating Systems Review
Year:
2009

Citing 22
Cited 6

Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
The Impact of Performance Asymmetry in Emerging Multicore Architectures

Proceedings of the 32nd annual international symposium on Computer Architecture
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Flexible Heterogeneous Multi-Core Architecture

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
CellSort: high performance sorting on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers

IBM Journal of Research and Development
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Vectorized data processing on the cell broadband engine

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Accelerating computing with the cell broadband engine processor

Proceedings of the 5th conference on Computing frontiers
Dma-based prefetching for i/o-intensive workloads on the cell architecture

Proceedings of the 5th conference on Computing frontiers
The PlayStation 3 for High-Performance Scientific Computing

Computing in Science and Engineering
Entering the petaflop era: the architecture and performance of Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FFTC: fastest Fourier transform for the IBM cell broadband engine

HiPC'07 Proceedings of the 14th international conference on High performance computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A capabilities-aware framework for using computational accelerators in data-intensive computing

Journal of Parallel and Distributed Computing
Variable-sized map and locality-aware reduce on public-resource grids

Future Generation Computer Systems
Reusable software components for accelerator-based clusters

Journal of Systems and Software
Variable-Sized map and locality-aware reduce on public-resource grids

GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
G-Hadoop: MapReduce across distributed data centers for data-intensive computing

Future Generation Computer Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Asymmetric multi-core processors (AMPs) with general-purpose and specialized cores packaged on the same chip, are emerging as a leading paradigm for high-end computing. A large body of existing research explores the use of standalone AMPs in computationally challenging and data-intensive applications. AMPs are rapidly deployed as high-performance accelerators on clusters. In these settings, scheduling, communication and I/O are managed by generalpurpose processors (GPPs), while computation is off-loaded to AMPs. Design space exploration for the configuration and software stack of hybrid clusters of AMPs and GPPs is an open problem. In this paper, we explore this design space in an implementation of the popular MapReduce programming model. Our contributions are: An exploration of various design alternatives for hybrid asymmetric clusters of AMPs and GPPs; the adoption of a streaming approach to supporting MapReduce computations on clusters with asymmetric components; and adaptive schedulers that take into account individual component capabilities in asymmetric clusters. Throughout our design, we remove I/O bottlenecks, using double-buffering and asynchronous I/O. We present an evaluation of the design choices through experiments on a real cluster with MapReduce workloads of varying degrees of computation intensity. We find that in a cluster with resource-constrained and well-provisioned AMP accelerators, a streaming approach achieves 50.5% and 73.1% better performance compared to the non-streaming approach, respectively, and scales almost linearly with increasing number of compute nodes.We also show that our dynamic scheduling mechanisms adapt effectively the parameters of the scheduling policies between applications with different computation density.