The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Portable profiling and tracing for parallel, scientific applications using C++
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Performance technology for complex parallel and distributed systems
Distributed and parallel systems
A tool framework for static and dynamic analysis of object-oriented software with templates
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Global arrays: a portable "shared-memory" programming model for distributed memory computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume I: Applications
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
Generalized portable shmem library for high performance computing
Generalized portable shmem library for high performance computing
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Parallel Programming and Parallel Abstractions in Fortress
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Liquid water: obtaining the right answer for the right reasons
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Coupled-cluster response theory: parallel algorithms and novel applications
Coupled-cluster response theory: parallel algorithms and novel applications
Enabling a highly-scalable global address space model for petascale computing
Proceedings of the 7th ACM international conference on Computing frontiers
Design and Implementation of a Hybrid Parallel Performance Measurement System
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Inspector/executor load balancing algorithms for block-sparse tensor contractions
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
The use of global address space languages and one-sided communication for complex applications is gaining attention in the parallel computing community. However, lack of good evaluative methods to observe multiple levels of performance makes it difficult to isolate the cause of performance deficiencies and to understand the fundamental limitations of system and application design for future improvement. NWChem is a popular computational chemistry package, which depends on the Global Arrays/Aggregate Remote Memory Copy Interface suite for partitioned global address space functionality to deliver high-end molecular modeling capabilities. A workload characterization methodology was developed to support NWChem performance engineering on large-scale parallel platforms. The research involved both the integration of performance instrumentation and measurement in the NWChem software, as well as the analysis of one-sided communication performance in the context of NWChem workloads. Scaling studies were conducted for NWChem on Blue Gene/P and on two large-scale clusters using different generation Infiniband interconnects and x86 processors. The performance analysis and results show how subtle changes in the runtime parameters related to the communication subsystem could have significant impact on performance behavior. The tool has successfully identified several algorithmic bottlenecks, which are already being tackled by computational chemists to improve NWChem performance. Copyright © 2011 John Wiley & Sons, Ltd.