Mochi: visual log-analysis based tools for debugging hadoop

Authors:
Jiaqi Tan;Xinghao Pan;Soila Kavulya;Rajeev Gandhi;Priya Narasimhan
Affiliations:
Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA;Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA;Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA;Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA;Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA
Venue:
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Year:
2009

Citing 12
Cited 10

Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
BorderPatrol: isolating events for black-box tracing

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Hunting for problems with Artemis

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
SALSA: analyzing logs as state machines

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Mining console logs for large-scale system problem detection

SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
X-trace: a pervasive network tracing framework

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Detecting application-level failures in component-based Internet services

IEEE Transactions on Neural Networks

Ganesha: blackBox diagnosis of MapReduce systems

ACM SIGMETRICS Performance Evaluation Review
An Analysis of Traces from a Production MapReduce Cluster

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Synoptic: summarizing system logs with refinement

SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
ASDF: an automated, online framework for diagnosing performance problems

Architecting dependable systems VII
Otus: resource attribution in data-intensive clusters

Proceedings of the second international workshop on MapReduce and its applications
New ideas track: testing mapreduce-style programs

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Mining temporal invariants from partially ordered logs

SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Understanding and improving the diagnostic workflow of MapReduce users

CHIMIT '11 Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology
Mining temporal invariants from partially ordered logs

ACM SIGOPS Operating Systems Review
An improved partitioning mechanism for optimizing massive data analysis using MapReduce

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mochi, a new visual, log-analysis based debugging tool correlates Hadoop's behavior in space, time and volume, and extracts a causal, unified control- and data-flow model of Hadoop across the nodes of a cluster. Mochi's analysis produces visualizations of Hadoop's behavior using which users can reason about and debug performance issues. We provide examples of Mochi's value in revealing a Hadoop job's structure, in optimizing real-world workloads, and in identifying anomalous Hadoop behavior, on the Yahoo! M45 Hadoop cluster.