Scalability analysis of SPMD codes using expectations

Authors:
Cristian Coarfa;John Mellor-Crummey;Nathan Froyd;Yuri Dotsenko
Affiliations:
Baylor College of Medicine, One Baylor Plaza, Houston, TX;Rice University, Houston, TX;CodeSourcery, Granite Bay, CA;Rice University, Houston, TX
Venue:
Proceedings of the 21st annual international conference on Supercomputing
Year:
2007

Citing 22
Cited 2

Quartz: a tool for tuning parallel program performance

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Call path profiling

ICSE '92 Proceedings of the 14th international conference on Software engineering
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Differential profiling

Software—Practice & Experience
Performance analysis of distributed applications using automatic classification of communication inefficiencies

Proceedings of the 14th international conference on Supercomputing
From trace generation to visualization: a performance framework for distributed parallel systems

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Statistical scalability analysis of communication operations in distributed applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dynamic statistical profiling of communication activity in distributed applications

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
MPI: The Complete Reference

MPI: The Complete Reference
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
The Paradyn Parallel Performance Measurement Tool

Computer
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
The Dynamic Probe Class Library: An Infrastucture for Developing Instrumentation for Performance Tools

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Multi-Platform Co-Array Fortran Compiler

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Scientific Computations on Modern Parallel Vector Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
An evaluation of global address space languages: co-array fortran and unified parallel C

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward Scalable Performance Visualization with Jumpshot

International Journal of High Performance Computing Applications
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Low-overhead call path profiling of unmodified, optimized code

Proceedings of the 19th annual international conference on Supercomputing
Portable high performance and scalability of partitioned global address space languages

Portable high performance and scalability of partitioned global address space languages
A scalable approach to MPI application performance analysis

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Using automated performance modeling to find scalability bugs in complex codes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new technique for identifying scalability bottlenecks in executions of single-program, multiple-data (SPMD) parallel programs, quantifying their impact on performance, and associating this information with the program source code. Our performance analysis strategy involves three steps. First, we collect call path profiles for two or more executions on different numbers of processors. Second, we use our expectations about how the performance of executions should differ, e.g., linear speedup for strong scaling or constant execution time for weak scaling, to automatically compute the scalability of costs incurred at each point in a program's execution. Third, with the aid of an interactive browser, an application developer can explore a program's performance in a top-down fashion, see the contexts in which poor scaling behavior arises, and understand exactly how much each scalability bottleneck dilates execution time. Our analysis technique is independent of the parallel programming model. We describe our experiences applying our technique to analyze parallel programs written in Co-array Fortran and Unified Parallel C, as well as message-passing programs based on MPI.