Statistical scalability analysis of communication operations in distributed applications

Authors:
Jeffrey S. Vetter;Michael O. McCracken
Affiliations:
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California;Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California
Venue:
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Year:
2001

Citing 19
Cited 33

Applied multivariate statistical analysis

Applied multivariate statistical analysis
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Analyzing scalability of parallel algorithms and architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Portable profiling and tracing for parallel, scientific applications using C++

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Readings in information visualization: using vision to think

Readings in information visualization: using vision to think
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Managing performance analysis with dynamic statistical projection pursuit

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Very high resolution simulation of compressible turbulence on the IBM-SP system

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance analysis of distributed applications using automatic classification of communication inefficiencies

Proceedings of the 14th international conference on Supercomputing
Semicoarsening Multigrid on Distributed Memory Machines

SIAM Journal on Scientific Computing
High performance reactive fluid flow simulations using adaptive mesh refinement on thousands of processors

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
MPI: The Complete Reference

MPI: The Complete Reference
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Software Visualization

Software Visualization
Parallel Performance Visualization: From Practice to Theory

IEEE Parallel & Distributed Technology: Systems & Technology
Parallel Performance Evaluation: The MEDEA Tool

HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking

An empirical performance evaluation of scalable scientific applications

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance Tool Support for MPI-2 on Linux

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Exploring the Energy-Time Tradeoff in MPI Programs on a Power-Scalable Cluster

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Whodunit: transactional profiling for multi-tier applications

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scalability analysis of SPMD codes using expectations

Proceedings of the 21st annual international conference on Supercomputing
Robust scalability analysis and SPM case studies

The Journal of Supercomputing
A framework for characterizing overlap of communication and computation in parallel applications

Cluster Computing
Characterizing the I/O behavior of scientific applications on the Cray XT

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Preserving time in large-scale communication traces

Proceedings of the 22nd annual international conference on Supercomputing
Characterizing application sensitivity to OS interference using kernel-level noise injection

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A decentralized parallel implementation for parallel tempering algorithm

Parallel Computing
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
FACT: fast communication trace collection for parallel applications through program slicing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Diagnosing performance bottlenecks in emerging petascale applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
MPInside: a performance analysis and diagnostic tool for MPI applications

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Comparison of execution time decomposition methods for performance evaluation

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
The Cilkview scalability analyzer

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Request distribution in hybrid processing environments

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Parallel Performance Wizard: A Performance System for the Analysis of Partitioned Global-Address-Space Applications

International Journal of High Performance Computing Applications
A systematic multi-step methodology for performance analysis of communication traces of distributed applications based on hierarchical clustering

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatic generation of executable communication specifications from parallel applications

Proceedings of the international conference on Supercomputing
Cache injection for parallel applications

Proceedings of the 20th international symposium on High performance distributed computing
Understanding and Improving Computational Science Storage Access through Continuous Characterization

ACM Transactions on Storage (TOS)
ScalaExtrap: Trace-based communication extrapolation for SPMD programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
ScalaTrace: tracing, analysis and modeling of HPC codes at scale

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Auto-generation of communication benchmark traces

ACM SIGMETRICS Performance Evaluation Review
A scalable infiniband network topology-aware performance analysis tool for MPI

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing
Determination of performance characteristics of scientific applications on IBM Blue Gene/Q

IBM Journal of Research and Development
There goes the neighborhood: performance degradation due to nearby jobs

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic identification of application I/O signatures from noisy server-side traces

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. This unprecedented degree of parallelism will undoubtedly expose scalability limitations in existing applications, where scalability is the ability of a parallel algorithm on a parallel architecture to effectively utilize an increasing number of processors. Users will need precise and automated techniques for detecting the cause of limited scalability. This paper addresses this dilemma. First, we argue that users face numerous challenges in understanding application scalability: managing substantial amounts of experiment data, extracting useful trends from this data, and reconciling performance information with their applications design. Second, we propose a solution to automate this data analysis problem by applying fundamental statistical techniques to scalability experiment data. Finally, we evaluate our operational prototype on several applications, and show that statistical techniques offer an effective strategy for assessing application scalability. In particular, we find that non-parametric correlation of the number of tasks to the ratio of the time for communication operations to overall communication time provides a reliable measure for identifying communication operations that scale poorly.