Understanding the Linux Kernel, Second Edition
Understanding the Linux Kernel, Second Edition
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Future Generation Computer Systems
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Multi-GPU volume rendering using MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The Client and the Cloud: Democratizing Research Computing
IEEE Internet Computing
Cloud Technologies for Bioinformatics Applications
IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows
Future Generation Computer Systems
Speeding-up codon analysis on the cloud with local MapReduce aggregation
Information Sciences: an International Journal
Hi-index | 0.00 |
Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.