Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

Authors:
Jeffrey S. Vetter;Frank Mueller
Affiliations:
-;-
Venue:
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Year:
2002

Citing 21
Cited 38

Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Architectural requirements of parallel scientific applications with explicit communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT

Proceedings of the 25th annual international symposium on Computer architecture
Portable profiling and tracing for parallel, scientific applications using C++

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Very high resolution simulation of compressible turbulence on the IBM-SP system

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance evaluation of the IBM SP and the Compaq AlphaServer SC

Proceedings of the 14th international conference on Supercomputing
Performance analysis of the Alpha 21264-based Compaq ES40 system

Proceedings of the 27th annual international symposium on Computer architecture
Semicoarsening Multigrid on Distributed Memory Machines

SIAM Journal on Scientific Computing
A comparison of three programming models for adaptive applications on the Origin2000

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer

ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Large scale parallel structured AMR calculations using the SAMRAI framework

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAS Parallel Benchmark Results

IEEE Parallel & Distributed Technology: Systems & Technology
LAPACK for Distributed Memory Architectures: Progress Report

Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing

Dynamic statistical profiling of communication activity in distributed applications

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Forgotten Factor: Facts on Performance Evaluation and Its Dependence on Workloads

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Workload Modeling for Performance Evaluation

Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
An empirical performance evaluation of scalable scientific applications

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalable analysis techniques for microprocessor performance counter metrics

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Detailed cache coherence characterization for OpenMP benchmarks

Proceedings of the 18th annual international conference on Supercomputing
An analysis of the impact of MPI overlap and independent progress

Proceedings of the 18th annual international conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Design Methodology for Efficient Application-Specific On-Chip Interconnects

IEEE Transactions on Parallel and Distributed Systems
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Nomad: migrating OS-bypass networks in virtual machines

Proceedings of the 3rd international conference on Virtual execution environments
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

IEEE Transactions on Parallel and Distributed Systems
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
NIC-based reduction algorithms for large-scale clusters

International Journal of High Performance Computing and Networking
Broadcasting algorithm of constant complexity for fully-switched clusters

SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Performance implications of virtualizing multicore cluster machines

Proceedings of the 2nd workshop on System-level virtualization for high performance computing
Runtime optimization of vector operations on large scale SMP clusters

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Message fragment based causal message logging

Journal of Parallel and Distributed Computing
Power saving in regular interconnection networks

Parallel Computing
Communication patterns

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic power saving in fat-tree interconnection networks using on/off links

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A systematic multi-step methodology for performance analysis of communication traces of distributed applications based on hierarchical clustering

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive connection management for scalable MPI over InfiniBand

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Congestion avoidance on manycore high performance computing systems

Proceedings of the 26th ACM international conference on Supercomputing
Exploiting communication and packaging locality for cost-effective large scale networks

Proceedings of the 26th ACM international conference on Supercomputing
Power-aware fat-tree networks using on/off links

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Parallel application-level behavioral attributes for performance and energy management of high-performance computing systems

Cluster Computing
Numprof: a performance analysis framework for numerical libraries

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor, we observe each application's inherent behavioral characteristics: point-to-point and collective communication, and floating-point operations. Furthermore, we explore the sensitivities of these characteristics to both problem size and number of processors. Our analysis reveals several striking similarities across our diverse set of applications including the use of collective operations, especially those collectives with very small data payloads. We also highlight a trend of novel applications parting with regimented, static communication patterns in favor of dynamically evolving patterns, as evidenced by our experiments on applications that use implicit linear solvers and adaptive mesh refinement. Overall, our study contributes a better understanding of the requirements of current and emerging paradigms of scientific computing in terms of their computation and communication demands.