The paragon performance monitoring environment
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A Measurement-Based Model to Predict the Performance Impact of System Modifications: A Case Study
IEEE Transactions on Parallel and Distributed Systems
Software—Practice & Experience
The emperor has no clothes: what HPC users need to say and HPC vendors need to hear
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A structured approach to instrumentation system development and evaluation
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Middleware: a model for distributed system services
Communications of the ACM
Simulation Modeling and Analysis
Simulation Modeling and Analysis
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
IPS-2: The Second Generation of a Parallel Program Measurement System
IEEE Transactions on Parallel and Distributed Systems
Performance Measurement Intrusion and Perturbation Analysis
IEEE Transactions on Parallel and Distributed Systems
JEWEL: Design and Implementation of a Distributed Measurement System
IEEE Transactions on Parallel and Distributed Systems
SPI: an instrumentation development environment for parallel/distributed systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
VIZIR: An Integrated Environment for Distributed Program Visualization
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
An Adaptive Cost System for Parallel Program Instrumentation
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
An interactive interface and RT-Mach support for monitoring and controlling resource management
RTAS '95 Proceedings of the Real-Time Technology and Applications Symposium
Modeling and Evaluating Design Alternatives for an On-Line Instrumentation System: A Case Study
IEEE Transactions on Software Engineering
Hi-index | 0.00 |
This paper presents a case study of modeling, evaluating, and testing the data collection services (called an instrumentation system) of the Paradyn parallel performance measurement tool using well-known performance evaluation and experiment design techniques. The overall objective of the study is to use modeling- and simulation-based evaluation to provide feedback to the tool developers to help them choose system configurations and task scheduling policies that can significantly reduce the data collection overheads. We develop and parameterize a resource occupancy (ROCC) model for Paradyn instrumentation system (IS) for an IBM SP-2 platform. This model is parameterized with a measurement-based workload characterization and subsequently used to answer several "what if" questions regarding configuration options and two policies to schedule instrumentation system tasks: collect-and-forward (CF) and batch-and-forward (BF) policies. Simulation results indicate that the BF policy can significantly reduce the overheads. Based on this feedback, the BF policy was implemented in Paradyn IS as an option to manage the data collection. Measurement-based testing results obtained from this enhanced version of Paradyn IS are reported in this paper and indicate more than 60% reduction in the direct IS overheads when the BF policy is used.