Using automated performance modeling to find scalability bugs in complex codes

Authors:
Alexandru Calotoiu;Torsten Hoefler;Marius Poke;Felix Wolf
Affiliations:
RWTH Aachen University, Aachen, Germany;ETH Zurich, Zurich, Switzerland;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 34
Cited 0

Beating the hold-out: bounds for K-fold and progressive cross-validation

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures

IEEE Parallel & Distributed Technology: Systems & Technology
Performance Analysis of Wavefront Algorithms on Very-Large Scale Distributed Systems

Workshop on Wide Area Networks and High Performance Computing
A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations

A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
An Algebra for Cross-Experiment Performance Analysis

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems

International Journal of High Performance Computing Applications
Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications

International Journal of High Performance Computing Applications
Parallel Simulation of Large-Scale Parallel Applications

International Journal of High Performance Computing Applications
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The structural simulation toolkit: exploring novel architectures

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Methods of inference and learning for performance modeling of parallel applications

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance Modeling and Prediction of Parallel and Distributed Computing Systems: A Survey of the State of the Art

CISIS '07 Proceedings of the First International Conference on Complex, Intelligent and Software Intensive Systems
A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 03
Scalability analysis of SPMD codes using expectations

Proceedings of the 21st annual international conference on Supercomputing
A regression-based approach to scalability prediction

Proceedings of the 22nd annual international conference on Supercomputing
SimGrid: A Generic Framework for Large-Scale Distributed Experiments

UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Roofline: an insightful visual performance model for multicore architectures

Communications of the ACM - A Direct Path to Dependable Software
Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A performance prediction framework for scientific applications

Future Generation Computer Systems
The Scalasca performance toolset architecture

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
LogGOPSim: simulating large-scale applications in the LogGOPS model

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Characterizing the Influence of System Noise on Large-Scale Applications by Simulation

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A Simulation Framework to Automatically Analyze the Communication-Computation Overlap in Scientific Applications

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Performance modeling for systematic performance tuning

State of the Practice Reports
An approach to performance prediction for parallel applications

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model

International Journal of High Performance Computing Applications
ScalaExtrap: Trace-based communication extrapolation for SPMD programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many parallel applications suffer from latent performance limitations that may prevent them from scaling to larger machine sizes. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made---a point where remediation can be difficult. However, creating analytical performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. In this paper, we show how both coverage and speed of this scalability analysis can be substantially improved. Generating an empirical performance model automatically for each part of a parallel program, we can easily identify those parts that will reduce performance at larger core counts. Using a climate simulation as an example, we demonstrate that scalability bugs are not confined to those routines usually chosen as kernels.