More convenient more overhead: the performance evaluation of Hadoop streaming

  • Authors:
  • Mengwei Ding;Long Zheng;Yanchao Lu;Li Li;Song Guo;Minyi Guo

  • Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China;The University of Aizu, Aizu-wakamatsu, Japan;Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;The University of Aizu, Aizu-wakamatsu, Japan;Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • Proceedings of the 2011 ACM Symposium on Research in Applied Computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.