MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems
ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
Hunting for problems with Artemis
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Hi-index | 0.01 |
In modern data centers, Hadoop has been widely used in perform data-intensive computation. Administrators of large scale hadoop clusters leverage statistical data collected at runtime to measure the efficiency of the cluster utilization. In this paper, we propose three statistical metrics - data locality ratio, load balance coefficient and access balance coefficient to quantify performance losses in data intensive applications. We evaluated our metrics using a large scale web click stream application running on a productive hadoop cluster at Tencent Inc.