Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Authors:
Jeffrey S. Vetter;Frank Mueller
Affiliations:
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA;Department of Computer Science, North Carolina State University, 448 EGRC, Raleigh, NC
Venue:
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Year:
2003

Citing 22
Cited 23

Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Architectural requirements of parallel scientific applications with explicit communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT

Proceedings of the 25th annual international symposium on Computer architecture
Portable profiling and tracing for parallel, scientific applications using C++

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Very high resolution simulation of compressible turbulence on the IBM-SP system

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance evaluation of the IBM SP and the Compaq AlphaServer SC

Proceedings of the 14th international conference on Supercomputing
Performance analysis of the Alpha 21264-based Compaq ES40 system

Proceedings of the 27th annual international symposium on Computer architecture
Semicoarsening Multigrid on Distributed Memory Machines

SIAM Journal on Scientific Computing
A comparison of three programming models for adaptive applications on the Origin2000

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer

ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Large scale parallel structured AMR calculations using the SAMRAI framework

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAS Parallel Benchmark Results

IEEE Parallel & Distributed Technology: Systems & Technology
MPX: Software for Multiplexing Hardware Performance Counters in Multithreaded Programs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
LAPACK for Distributed Memory Architectures: Progress Report

Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing

Detailed cache coherence characterization for OpenMP benchmarks

Proceedings of the 18th annual international conference on Supercomputing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of the Sun Fire Link SMP clusters

International Journal of High Performance Computing and Networking
MPIWiz: subgroup reproducible replay of mpi applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring pattern-aware routing in generalized fat tree networks

Proceedings of the 23rd international conference on Supercomputing
Extracting and predicting the communication behaviour of parallel applications

International Journal of Parallel, Emergent and Distributed Systems
Flow-aware allocation for on-chip networks

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
FACT: fast communication trace collection for parallel applications through program slicing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model for simulation of heterogeneous high-performance computing environments

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
ompP: a profiling tool for OpenMP

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A compiler-based communication analysis approach for multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallelism and data movement characterization of contemporary application classes

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
On the use of cluster-based partial message logging to improve fault tolerance for MPI HPC applications

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Impact of Kernel-assisted MPI communication over scientific applications: CPMD and FFTW

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A network evaluation for LAN, MAN and WAN grid environments

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Optimization principles for collective neighborhood communications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On using incremental profiling for the performance analysis of shared memory parallel applications

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A new memory slowdown model for the characterization of computing systems

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Identifying HPC codes via performance logs and machine learning

Proceedings of the first workshop on Changing landscapes in HPC security
Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the message passing interface (MPI) and by using hardware counters on the microprocessor, we observe each application's inherent behavioral characteristics: point-to-point and collective communication, and floating-point operations. Furthermore, we explore the sensitivities of these characteristics to both problem size and number of processors. Our analysis reveals several striking similarities across our diverse set of applications including the use of collective operations, especially those collectives with very small data payloads. We also highlight a trend of novel applications parting with regimented, static communication patterns in favor of dynamically evolving patterns, as evidenced by our experiments on applications that use implicit linear solvers and adaptive mesh refinement. Overall, our study contributes a better understanding of the requirements of current and emerging paradigms of scientific computing in terms of their computation and communication demands.