A Language and System for the Construction and Tuning of Parallel Programs
IEEE Transactions on Software Engineering
Performance Prediction and Calibration for a Class of Multiprocessors
IEEE Transactions on Computers
A mechanism for efficient debugging of parallel programs
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Parallel depth first search. Part I. implementation
International Journal of Parallel Programming
Performance-Measurement Tools in a Multiprocessor Environment
IEEE Transactions on Computers
Visualizing Performance Debugging
Computer
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analyzing and visualizing performance of memory hierarchies
Parallel computer systems
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Optimally profiling and tracing programs
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal tracing and replay for debugging message-passing parallel programs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Exploiting task and data parallelism on a multicomputer
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Analytical performance prediction on multicomputers
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance debugging using parallel performance predicates
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The advantages of multiple parallelizations in combinatorial search
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Visualizing the Performance of Parallel Programs
IEEE Software
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
The Search for Lost Cycles: A New Approach to Parallel Program Performance Evaluation
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automated performance prediction of message-passing parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Waiting time analysis and performance visualization in Carnival
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Performance improvement through overhead analysis: a case study in molecular dynamics
ICS '97 Proceedings of the 11th international conference on Supercomputing
Compile-time minimisation of load imbalance in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Modeling and Evaluating Design Alternatives for an On-Line Instrumentation System: A Case Study
IEEE Transactions on Software Engineering
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
A comparative analysis of four parallelisation schemes
ICS '99 Proceedings of the 13th international conference on Supercomputing
Modeling, evaluation, and testing of paradyn instrumentation system
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
A Tool to Help Tune where Computation Is Performed
IEEE Transactions on Software Engineering
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Automated Scalability Analysis of Message-Passing Parallel Programs
IEEE Parallel & Distributed Technology: Systems & Technology
Operational Data Analysis: Improved Predictions Using Multi-computer Pattern Detection
DSOM '00 Proceedings of the 11th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Services Management in Intelligent Networks
Extended Overhead Analysis for OpenMP (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Performance Prediction with Benchmaps
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Predicting the Running Times of Parallel Programs by Simulation
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
The design of a performance steering system for component-based grid applications
Performance analysis and grid computing
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A performance prediction framework for scientific applications
Future Generation Computer Systems
Scientific Programming - Distributed Computing and Applications
Analysis of input-dependent program behavior using active profiling
Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling
ecs'07 Experimental computer science on Experimental computer science
A genetic algorithms approach to modeling the performance of memory-bound computations
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive algorithms in the management of computer systems
IBM Systems Journal
A performance prediction framework for scientific applications
Future Generation Computer Systems
Extended overhead analysis for OpenMP performance tuning
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Proceedings of the Conference on Design, Automation and Test in Europe
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis of shared-memory parallel applications using performance properties
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
ScoPred–scalable user-directed performance prediction using complexity modeling and historical data
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Hi-index | 0.00 |
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying conditions. Analytic modeling and scalability analysis provide predictive power, but are not widely used in practice, due primarily to their emphasis on asymptotic behavior and the difficulty of developing accurate models that work for real-world programs. In this paper we describe a set of tools for performance tuning of parallel programs that bridges this gap between measurement and modeling.Our approach is based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We first describe a tool for measuring overheads in parallel programs that we have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1. We then describe a tool that fits these overhead measurements to analytic forms. We illustrate the use of these tools by analyzing the performance tradeoffs among parallel implementations of 2D FFT. These examples show how our tools enable programmers to develop accurate performance models of parallel applications without requiring extensive performance modeling expertise.