HiTune: dataflow-based performance analysis for big data cloud

Authors:
Jinquan Dai;Jie Huang;Shengsheng Huang;Bo Huang;Yan Liu
Affiliations:
Intel Asia-Pacific Research and Development Ltd, Shanghai, P.R.China;Intel Asia-Pacific Research and Development Ltd, Shanghai, P.R.China;Intel Asia-Pacific Research and Development Ltd, Shanghai, P.R.China;Intel Asia-Pacific Research and Development Ltd, Shanghai, P.R.China;Intel Asia-Pacific Research and Development Ltd, Shanghai, P.R.China
Venue:
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Year:
2011

Citing 12
Cited 1

Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Experiences with MapReduce, an abstraction for large-scale computation

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Magpie: online modelling and performance-aware systems

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers

IEEE Micro
Hunting for problems with Artemis

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
X-trace: a pervasive network tracing framework

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although Big Data Cloud (e.g., MapReduce, Hadoop and Dryad) makes it easy to develop and run highly scalable applications, efficient provisioning and fine-tuning of these massively distributed systems remain a major challenge. In this paper, we describe a general approach to help address this challenge, based on distributed instrumentations and dataflow-driven performance analysis. Based on this approach, we have implemented HiTune, a scalable, lightweight and extensible performance analyzer for Hadoop. We report our experience on how HiTune helps users to efficiently conduct Hadoop performance analysis and tuning, demonstrating the benefits of dataflow-based analysis and the limitations of existing approaches (e.g., system statistics, Hadoop logs and metrics, and traditional profiling).