Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Future Generation Computer Systems - Special issue on metacomputing
Pajé: An Extensible Environment for Visualizing Multi-threaded Programs Executions
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Toward Scalable Performance Visualization with Jumpshot
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
The Computational and Storage Potential of Volunteer Computing
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
SimGrid: A Generic Framework for Large-Scale Distributed Experiments
UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
GridBot: execution of bags of tasks in multiple grids
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Triva: Interactive 3D visualization for performance analysis of parallel applications
Future Generation Computer Systems
A taxonomy of grid monitoring systems
Future Generation Computer Systems
The Scalasca performance toolset architecture
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Fast and scalable simulation of volunteer computing systems using SimGrid
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
Implementation and usage of the PERUSE-Interface in open MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Hi-index | 0.00 |
Large scale distributed systems are composed of many thousands of computing units. Today's examples of such systems are grid, volunteer and cloud computing platforms. Generally, their analyses are done through monitoring tools that gather resource information like processor or network utilization, providing high-level statistics and basic resource usage traces. Such approaches are recognized as rather scalable but are unfortunately often insufficient to detect or fully understand unexpected behavior. In this paper, we investigate the use of more detailed tracing techniques --commonly used in parallel computing-- in distributed systems. Finely analyzing the behavior of such systems comprising thousands of resources over several months may seem infeasible. Yet, we show that the resulting trace can be analyzed using tools that enable to easily zoom in and out on selected area of space and time. We use the BOINC volunteer computing system as a basis of this study. Since detailed activity traces of the BOINC clients are not available yet, we rely instead on traces obtained through a BOINC simulator developed with the SimGrid toolkit and which uses as input real availability trace files from the Seti@Home BOINC project. We show that the analysis of such detailed resource utilization traces provides several non-trivial insights about the whole system and enables the discovery of unexpected behavior.