Future Generation Computer Systems - Special issue on metacomputing
Using MPI-2: Advanced Features of the Message Passing Interface
Using MPI-2: Advanced Features of the Message Passing Interface
Visualisation of Distributed Applications for Performance Debugging
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Pajé: An Extensible Environment for Visualizing Multi-threaded Programs Executions
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Trace-Scaling Agent for Parallel Application Tracing
ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Tree-Maps: a space-filling approach to the visualization of hierarchical information structures
VIS '91 Proceedings of the 2nd conference on Visualization '91
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Automatic Experimental Analysis of Communication Patterns in Virtual Topologies
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Toward Scalable Performance Visualization with Jumpshot
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
The Computational and Storage Potential of Volunteer Computing
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
DIMVisual: Data Integration Model for Visualization of Parallel Programs Behavior
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Performance Evaluation of Scheduling Policies for Volunteer Computing
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
SimGrid: A Generic Framework for Large-Scale Distributed Experiments
UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Deploying the LHC computing grid - the LCG service challenges
LGDI '05 Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Automatic detection of parallel applications computation phases
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
GridBot: execution of bags of tasks in multiple grids
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Evaluating similarity-based trace reduction techniques for scalable performance analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Triva: Interactive 3D visualization for performance analysis of parallel applications
Future Generation Computer Systems
SBAC-PAD '09 Proceedings of the 2009 21st International Symposium on Computer Architecture and High Performance Computing
A taxonomy of grid monitoring systems
Future Generation Computer Systems
The Scalasca performance toolset architecture
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Visualization of repetitive patterns in event traces
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Fast and scalable simulation of volunteer computing systems using SimGrid
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Effective Performance Measurement at Petascale Using IPM
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Kremlin: like gprof, but for parallelization
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Implementation and usage of the PERUSE-Interface in open MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
On the validity of flow-level tcp network models for grid and cloud simulations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Hi-index | 0.00 |
Understanding the behavior of large scale distributed systems is generally extremely difficult as it requires to observe a very large number of components over very large time. Most analysis tools for distributed systems gather basic information such as individual processor or network utilization. Although scalable because of the data reduction techniques applied before the analysis, these tools are often insufficient to detect or fully understand anomalies in the dynamic behavior of resource utilization and their influence on the applications performance. In this paper, we propose a methodology for detecting resource usage anomalies in large scale distributed systems. The methodology relies on four functionalities: characterized trace collection, multi-scale data aggregation, specifically tailored user interaction techniques, and visualization techniques. We show the efficiency of this approach through the analysis of simulations of the volunteer computing Berkeley Open Infrastructure for Network Computing architecture. Three scenarios are analyzed in this paper: analysis of the resource sharing mechanism, resource usage considering response time instead of throughput, and the evaluation of input file size on Berkeley Open Infrastructure for Network Computing architecture. The results show that our methodology enables to easily identify resource usage anomalies, such as unfair resource sharing, contention, moving network bottlenecks, and harmful short-term resource sharing. Copyright © 2011 John Wiley & Sons, Ltd.