RasterZip: compressing network monitoring data with support for partial decompression
Proceedings of the 2012 ACM conference on Internet measurement conference
Hi-index | 0.00 |
Apache Hadoop is a Java based distributed computing framework built for applications implemented using MapReduce programming model. In recent years, Hadoop technology has experienced an unprecedented growth in its adoption. From single-node clusters to clusters with well over thousands of nodes, Hadoop technology is being used to perform myriad of functions - search optimizations, data mining, click stream analytics, machine learning to name a few. Although setting up Hadoop clusters and building applications for Hadoop is a well understood area, tuning Hadoop clusters for optimal performance is still a black art. In this demo paper, we will attempt to provide the audience with a holistic approach of Hadoop performance tuning methodologies and best practices. We discuss hardware as well as software tuning techniques including BIOS, OS, JVM and Hadoop configuration parameters tuning.