Performance characterization of global address space applications: a case study with NWChem

Authors:
Jeff R. Hammond;Sriram Krishnamoorthy;Sameer Shende;Nichols A. Romero;Allen D. Malony
Affiliations:
Argonne National Laboratory, Argonne, IL, USA;Pacific Northwest National Laboratory, Richland, WA, USA;University of Oregon, Eugene, OR, USA;Argonne National Laboratory, Argonne, IL, USA;University of Oregon, Eugene, OR, USA
Venue:
Concurrency and Computation: Practice & Experience
Year:
2012

Citing 26
Cited 2

The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Portable profiling and tracing for parallel, scientific applications using C++

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Performance technology for complex parallel and distributed systems

Distributed and parallel systems
A tool framework for static and dynamic analysis of object-oriented software with templates

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Global arrays: a portable "shared-memory" programming model for distributed memory computers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallel Computing in Quantum Chemistry - Message Passing and Beyond for a General Ab Initio Program System

HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume I: Applications
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Generalized portable shmem library for high performance computing

Generalized portable shmem library for high performance computing
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Measuring and improving application performance with PerfSuite

Linux Journal
Parallel Programming and Parallel Abstractions in Fortress

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Anatomy of high-performance matrix multiplication

ACM Transactions on Mathematical Software (TOMS)
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Liquid water: obtaining the right answer for the right reasons

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Coupled-cluster response theory: parallel algorithms and novel applications

Coupled-cluster response theory: parallel algorithms and novel applications
Enabling a highly-scalable global address space model for petascale computing

Proceedings of the 7th ACM international conference on Computing frontiers
Design and Implementation of a Hybrid Parallel Performance Measurement System

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing

A scalable infrastructure for the performance analysis of passive target synchronization

Parallel Computing
Inspector/executor load balancing algorithms for block-sparse tensor contractions

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of global address space languages and one-sided communication for complex applications is gaining attention in the parallel computing community. However, lack of good evaluative methods to observe multiple levels of performance makes it difficult to isolate the cause of performance deficiencies and to understand the fundamental limitations of system and application design for future improvement. NWChem is a popular computational chemistry package, which depends on the Global Arrays/Aggregate Remote Memory Copy Interface suite for partitioned global address space functionality to deliver high-end molecular modeling capabilities. A workload characterization methodology was developed to support NWChem performance engineering on large-scale parallel platforms. The research involved both the integration of performance instrumentation and measurement in the NWChem software, as well as the analysis of one-sided communication performance in the context of NWChem workloads. Scaling studies were conducted for NWChem on Blue Gene/P and on two large-scale clusters using different generation Infiniband interconnects and x86 processors. The performance analysis and results show how subtle changes in the runtime parameters related to the communication subsystem could have significant impact on performance behavior. The tool has successfully identified several algorithmic bottlenecks, which are already being tackled by computational chemists to improve NWChem performance. Copyright © 2011 John Wiley & Sons, Ltd.