Parallel performance prediction using lost cycles analysis

Authors:
Mark E. Crovella;Thomas J. LeBlanc
Affiliations:
University of Rochester, Rochester, New York;University of Rochester, Rochester, New York
Venue:
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Year:
1994

Citing 19
Cited 35

A Language and System for the Construction and Tuning of Parallel Programs

IEEE Transactions on Software Engineering
Performance Prediction and Calibration for a Class of Multiprocessors

IEEE Transactions on Computers
A mechanism for efficient debugging of parallel programs

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Parallel depth first search. Part I. implementation

International Journal of Parallel Programming
Performance-Measurement Tools in a Multiprocessor Environment

IEEE Transactions on Computers
Visualizing Performance Debugging

Computer
Quartz: a tool for tuning parallel program performance

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analyzing and visualizing performance of memory hierarchies

Parallel computer systems
MemSpy: analyzing memory system bottlenecks in programs

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Optimally profiling and tracing programs

POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal tracing and replay for debugging message-passing parallel programs

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Chores: enhanced run-time support for shared-memory parallel computing

ACM Transactions on Computer Systems (TOCS)
Exploiting task and data parallelism on a multicomputer

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Analytical performance prediction on multicomputers

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance debugging using parallel performance predicates

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The advantages of multiple parallelizations in combinatorial search

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Visualizing the Performance of Parallel Programs

IEEE Software
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications

IEEE Transactions on Parallel and Distributed Systems
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation

The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation

High-level optimization via automated statistical modeling

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automated performance prediction of message-passing parallel programs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Waiting time analysis and performance visualization in Carnival

SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Performance improvement through overhead analysis: a case study in molecular dynamics

ICS '97 Proceedings of the 11th international conference on Supercomputing
Compile-time minimisation of load imbalance in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Modeling and Evaluating Design Alternatives for an On-Line Instrumentation System: A Case Study

IEEE Transactions on Software Engineering
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements

IEEE Transactions on Parallel and Distributed Systems
A comparative analysis of four parallelisation schemes

ICS '99 Proceedings of the 13th international conference on Supercomputing
Modeling, evaluation, and testing of paradyn instrumentation system

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Tools for application-oriented performance tuning

ICS '01 Proceedings of the 15th international conference on Supercomputing
A Tool to Help Tune where Computation Is Performed

IEEE Transactions on Software Engineering
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
Automated Scalability Analysis of Message-Passing Parallel Programs

IEEE Parallel & Distributed Technology: Systems & Technology
Operational Data Analysis: Improved Predictions Using Multi-computer Pattern Detection

DSOM '00 Proceedings of the 11th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Services Management in Intelligent Networks
Extended Overhead Analysis for OpenMP (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Performance Prediction with Benchmaps

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Predicting the Running Times of Parallel Programs by Simulation

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The design of a performance steering system for component-based grid applications

Performance analysis and grid computing
EMPS: An Environment for Memory Performance Studies

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
How Well Can Simple Metrics Represent the Performance of HPC Applications?

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications

Future Generation Computer Systems
Adaptive load balancing of parallel applications with multi-agent reinforcement learning on heterogeneous systems

Scientific Programming - Distributed Computing and Applications
Analysis of input-dependent program behavior using active profiling

Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling

ecs'07 Experimental computer science on Experimental computer science
A genetic algorithms approach to modeling the performance of memory-bound computations

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive algorithms in the management of computer systems

IBM Systems Journal
A performance prediction framework for scientific applications

Future Generation Computer Systems
Metrics for evaluation of parallel efficiency toward highly parallel processing

Parallel Computing
Extended overhead analysis for OpenMP performance tuning

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A generic platform for estimation of multi-threaded program performance on heterogeneous multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis of shared-memory parallel applications using performance properties

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
ScoPred–scalable user-directed performance prediction using complexity modeling and historical data

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying conditions. Analytic modeling and scalability analysis provide predictive power, but are not widely used in practice, due primarily to their emphasis on asymptotic behavior and the difficulty of developing accurate models that work for real-world programs. In this paper we describe a set of tools for performance tuning of parallel programs that bridges this gap between measurement and modeling.Our approach is based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We first describe a tool for measuring overheads in parallel programs that we have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1. We then describe a tool that fits these overhead measurements to analytic forms. We illustrate the use of these tools by analyzing the performance tradeoffs among parallel implementations of 2D FFT. These examples show how our tools enable programmers to develop accurate performance models of parallel applications without requiring extensive performance modeling expertise.