Carbon nanotube coated high-throughput neurointerfaces in assistive environments
Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
FPMR: MapReduce framework on FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Multi-GPU volume rendering using MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamic proportional share scheduling in Hadoop
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Reusable software components for accelerator-based clusters
Journal of Systems and Software
Optimizing MapReduce for GPUs with effective shared memory usage
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
HAT: history-based auto-tuning MapReduce in heterogeneous environments
The Journal of Supercomputing
Accelerate MapReduce on GPUs with multi-level reduction
Proceedings of the 5th Asia-Pacific Symposium on Internetware
CPU+GPU scheduling with asymptotic profiling
Parallel Computing
Hi-index | 0.00 |
The use of asymmetric multi-core processors with on-chip computational accelerators is becoming common in a variety of environments ranging from scientific computing to enterprise applications. The focus of current research has been on making efficient use of individual systems, and porting applications to asymmetric processors. In this paper, we take the next step by investigating the use of multi-core-based systems, especially the popular Cell processor, in a cluster setting. We present CellMR, an efficient and scalable implementation of the MapReduce framework for asymmetric Cell-based clusters. The novelty of CellMR lies in its adoption of a streaming approach to supporting MapReduce, and its adaptive resource scheduling schemes: Instead of allocating workloads to the components once, CellMR slices the input into small work units and streams them to the asymmetric nodes for efficient processing. Moreover, CellMR removes I/O bottlenecks by design, using a number of techniques, such as double-buffering and asynchronous I/O, to maximize cluster performance. Our evaluation of CellMR using typical MapReduce applications shows that it achieves 50.5% better performance compared to the standard nonstreaming approach, introduces a very small overhead on the manager irrespective of application input size, scales almost linearly with increasing number of compute nodes (a speedup of 6.9 on average, when using eight nodes compared to a single node), and adapts effectively the parameters of its resource management policy between applications with varying computation density.