Performance modeling for systematic performance tuning

Authors:
Torsten Hoefler;William Gropp;William Kramer;Marc Snir
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
State of the Practice Reports
Year:
2011

Citing 21
Cited 5

Two algorithms for barrier synchronization

International Journal of Parallel Programming
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Modeling the Communication Performance of the IBM SP2

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations

A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
The structural simulation toolkit: exploring novel architectures

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
SKaMPI: a comprehensive benchmark for public benchmarking of MPI

Scientific Programming
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
LogGOPSim: simulating large-scale applications in the LogGOPS model

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Parallel zero-copy algorithms for fast Fourier transform and conjugate gradient using MPI datatypes

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
The PERCS High-Performance Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Simulating Large Scale Parallel Applications Using Statistical Models for Sequential Execution Blocks

ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Bridging performance analysis tools and analytic performance modeling for HPC

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Netgauge: a network performance measurement framework

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Dataflow-driven GPU performance projection for multi-kernel transformations

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Aspen: a domain specific language for performance modeling

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using automated performance modeling to find scalability bugs in complex codes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Modeling synthetic aperture radar computation with Aspen

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of parallel scientific applications depends on many factors which are determined by the execution environment and the parallel application. Especially on large parallel systems, it is too expensive to explore the solution space with series of experiments. Deriving analytical models for applications and platforms allow estimating and extrapolating their execution performance, bottlenecks, and the potential impact of optimization options. We propose to use such "performance modeling" techniques beginning from the application design process throughout the whole software development cycle and also during the lifetime of supercomputer systems. Such models help to guide supercomputer system design and re-engineering efforts to adopt applications to changing platforms and allow users to estimate costs to solve a particular problem. Models can often be built with the help of well-known performance profiling tools. We discuss how we successfully used modeling throughout the proposal, initial testing, and beginning deployment phase of the Blue Waters supercomputer system.