CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters

  • Authors:
  • M. Mustafa Rafique;Benjamin Rose;Ali R. Butt;Dimitrios S. Nikolopoulos

  • Affiliations:
  • Dept. of Computer Science, Virginia Tech. Blacksburg, USA;Dept. of Computer Science, Virginia Tech. Blacksburg, USA;Dept. of Computer Science, Virginia Tech. Blacksburg, USA;Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), GR 700 13, Heraklion Crete, Greece

  • Venue:
  • IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of asymmetric multi-core processors with on-chip computational accelerators is becoming common in a variety of environments ranging from scientific computing to enterprise applications. The focus of current research has been on making efficient use of individual systems, and porting applications to asymmetric processors. In this paper, we take the next step by investigating the use of multi-core-based systems, especially the popular Cell processor, in a cluster setting. We present CellMR, an efficient and scalable implementation of the MapReduce framework for asymmetric Cell-based clusters. The novelty of CellMR lies in its adoption of a streaming approach to supporting MapReduce, and its adaptive resource scheduling schemes: Instead of allocating workloads to the components once, CellMR slices the input into small work units and streams them to the asymmetric nodes for efficient processing. Moreover, CellMR removes I/O bottlenecks by design, using a number of techniques, such as double-buffering and asynchronous I/O, to maximize cluster performance. Our evaluation of CellMR using typical MapReduce applications shows that it achieves 50.5% better performance compared to the standard nonstreaming approach, introduces a very small overhead on the manager irrespective of application input size, scales almost linearly with increasing number of compute nodes (a speedup of 6.9 on average, when using eight nodes compared to a single node), and adapts effectively the parameters of its resource management policy between applications with varying computation density.