MapReduce for the cell broadband engine architecture

Authors:
M. de Kruijf;K. Sankaralingam
Affiliations:
University of Wisconsin, Department of Computer Science, Madison, Wisconsin;University of Wisconsin, Department of Computer Science, Madison, Wisconsin
Venue:
IBM Journal of Research and Development
Year:
2009

Citing 16
Cited 12

Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
CellSort: high performance sorting on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Towards Efficient MapReduce Using MPI

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Cell broadband engine processor performance optimization: tracing tools implementation and use

IBM Journal of Research and Development
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Garbage collection auto-tuning for Java mapreduce on multi-cores

Proceedings of the international symposium on Memory management
Parallel data processing with MapReduce: a survey

ACM SIGMOD Record
More convenient more overhead: the performance evaluation of Hadoop streaming

Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Sorting on GPUs for large scale datasets: A thorough comparison

Information Processing and Management: an International Journal
A MapReduce-supported network structure for data centers

Concurrency and Computation: Practice & Experience
Hierarchical merge for scalable MapReduce

Proceedings of the 2012 workshop on Management of big data systems
A New Electronic Commerce Architecture in the Cloud

Journal of Electronic Commerce in Organizations
HAT: history-based auto-tuning MapReduce in heterogeneous environments

The Journal of Supercomputing
Efficient sorting design on a novel embedded parallel computing architecture with unique memory access

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a simple and flexible parallel programming model proposed by Google for large-scale distributed data processing. In this paper, we present a design and prototype implementation of MapReduce for the Cell Broadband Engine® Architecture (CBEA). The MapReduce model provides a simple machine abstraction that shields users from parallelization and other distributed programming complications. The goal of this paper is to describe the tradeoffs in the design of the runtime and demonstrate the potential for high performance. We study the basic characteristics of the MapReduce model and identify three types of MapReduce applications: map dominated, partition dominated, and sort dominated. We evaluate our runtime performance, scalability, and efficiency for microbenchmarks representing each of these application types as well as for complete applications. We find that map-dominated applications map well to the CBEA and that our prototype sustains high performance on these applications. For partition-dominated and sort-dominated applications, we analyze runtime performance, identify sources of inefficiency, and propose several future enhancements to significantly improve performance. Overall, we find that the simplicity and efficiency of the model make it an attractive tool for programming Cell Broadband Engine processor-based platforms.