Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization

  • Authors:
  • Lucas Mello Schnorr;Arnaud Legrand;Jean-Marc Vincent

  • Affiliations:
  • INRIA, CNRS, University of Grenoble, Grenoble, France;INRIA, CNRS, University of Grenoble, Grenoble, France;INRIA, CNRS, University of Grenoble, Grenoble, France

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Understanding the behavior of large scale distributed systems is generally extremely difficult as it requires to observe a very large number of components over very large time. Most analysis tools for distributed systems gather basic information such as individual processor or network utilization. Although scalable because of the data reduction techniques applied before the analysis, these tools are often insufficient to detect or fully understand anomalies in the dynamic behavior of resource utilization and their influence on the applications performance. In this paper, we propose a methodology for detecting resource usage anomalies in large scale distributed systems. The methodology relies on four functionalities: characterized trace collection, multi-scale data aggregation, specifically tailored user interaction techniques, and visualization techniques. We show the efficiency of this approach through the analysis of simulations of the volunteer computing Berkeley Open Infrastructure for Network Computing architecture. Three scenarios are analyzed in this paper: analysis of the resource sharing mechanism, resource usage considering response time instead of throughput, and the evaluation of input file size on Berkeley Open Infrastructure for Network Computing architecture. The results show that our methodology enables to easily identify resource usage anomalies, such as unfair resource sharing, contention, moving network bottlenecks, and harmful short-term resource sharing. Copyright © 2011 John Wiley & Sons, Ltd.