Apache hadoop performance-tuning methodologies and best practices

  • Authors:
  • Shrinivas B. Joshi

  • Affiliations:
  • Advanced Micro Devices, Inc., Austin, TX, USA

  • Venue:
  • ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Apache Hadoop is a Java based distributed computing framework built for applications implemented using MapReduce programming model. In recent years, Hadoop technology has experienced an unprecedented growth in its adoption. From single-node clusters to clusters with well over thousands of nodes, Hadoop technology is being used to perform myriad of functions - search optimizations, data mining, click stream analytics, machine learning to name a few. Although setting up Hadoop clusters and building applications for Hadoop is a well understood area, tuning Hadoop clusters for optimal performance is still a black art. In this demo paper, we will attempt to provide the audience with a holistic approach of Hadoop performance tuning methodologies and best practices. We discuss hardware as well as software tuning techniques including BIOS, OS, JVM and Hadoop configuration parameters tuning.