More convenient more overhead: the performance evaluation of Hadoop streaming

Authors:
Mengwei Ding;Long Zheng;Yanchao Lu;Li Li;Song Guo;Minyi Guo
Affiliations:
Shanghai Jiao Tong University, Shanghai, China;The University of Aizu, Aizu-wakamatsu, Japan;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;The University of Aizu, Aizu-wakamatsu, Japan;Shanghai Jiao Tong University, Shanghai, China
Venue:
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
Year:
2011

Citing 10
Cited 2

Understanding the Linux Kernel, Second Edition

Understanding the Linux Kernel, Second Edition
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
MapReduce for the cell broadband engine architecture

IBM Journal of Research and Development
Multi-GPU volume rendering using MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The Client and the Cloud: Democratizing Research Computing

IEEE Internet Computing
Cloud Technologies for Bioinformatics Applications

IEEE Transactions on Parallel and Distributed Systems

Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows

Future Generation Computer Systems
Speeding-up codon analysis on the cloud with local MapReduce aggregation

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.