Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Cell broadband engine processor performance optimization: tracing tools implementation and use
IBM Journal of Research and Development
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Garbage collection auto-tuning for Java mapreduce on multi-cores
Proceedings of the international symposium on Memory management
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
More convenient more overhead: the performance evaluation of Hadoop streaming
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Sorting on GPUs for large scale datasets: A thorough comparison
Information Processing and Management: an International Journal
A MapReduce-supported network structure for data centers
Concurrency and Computation: Practice & Experience
Hierarchical merge for scalable MapReduce
Proceedings of the 2012 workshop on Management of big data systems
A New Electronic Commerce Architecture in the Cloud
Journal of Electronic Commerce in Organizations
HAT: history-based auto-tuning MapReduce in heterogeneous environments
The Journal of Supercomputing
Computers and Electrical Engineering
Hi-index | 0.00 |
MapReduce is a simple and flexible parallel programming model proposed by Google for large-scale distributed data processing. In this paper, we present a design and prototype implementation of MapReduce for the Cell Broadband Engine® Architecture (CBEA). The MapReduce model provides a simple machine abstraction that shields users from parallelization and other distributed programming complications. The goal of this paper is to describe the tradeoffs in the design of the runtime and demonstrate the potential for high performance. We study the basic characteristics of the MapReduce model and identify three types of MapReduce applications: map dominated, partition dominated, and sort dominated. We evaluate our runtime performance, scalability, and efficiency for microbenchmarks representing each of these application types as well as for complete applications. We find that map-dominated applications map well to the CBEA and that our prototype sustains high performance on these applications. For partition-dominated and sort-dominated applications, we analyze runtime performance, identify sources of inefficiency, and propose several future enhancements to significantly improve performance. Overall, we find that the simplicity and efficiency of the model make it an attractive tool for programming Cell Broadband Engine processor-based platforms.