Space-time trade-off optimization for a class of electronic structure calculations

Authors:
Daniel Cociorva;Gerald Baumgartner;Chi-Chung Lam;P. Sadayappan;J. Ramanujam;Marcel Nooijen;David E. Bernholdt;Robert Harrison
Affiliations:
Ohio State University;Ohio State University;Ohio State University;Ohio State University;Louisiana State University;Princeton University;Oak Ridge National Laboratory;Pacific Northwest National Laboratory
Venue:
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Year:
2002

Citing 15
Cited 18

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The implementation and evaluation of fusion and contraction in array languages

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Compilation and delayed evaluation in APL

POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Memory-Optimal Evaluation of Expression Trees Involving Large Objects

HiPC '99 Proceedings of the 6th International Conference on High Performance Computing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Loop Alignment for Memory Accesses Optimization

Proceedings of the 12th international symposium on System synthesis
Performance optimization of a class of loops implementing multidimensional integrals

Performance optimization of a class of loops implementing multidimensional integrals
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Compiler algorithms for efficient use of memory systems

Compiler algorithms for efficient use of memory systems

A high-level approach to synthesis of high-performance codes for quantum chemistry

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Static array storage optimization in MATLAB

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Comparing various parallelizing approaches for tribology simulations

High performance scientific and engineering computing
Experiments with Parallelizing Tribology Simulations

The Journal of Supercomputing
Cache Miss Characterization and Data Locality Optimization for Imperfectly Nested Loops on Shared Memory Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A polynomial-time algorithm for memory space reduction

International Journal of Parallel Programming
Data Centric Transformations on Non-Integer Iteration Spaces

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Layout transformation support for the disk resident arrays framework

The Journal of Supercomputing
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Efficient parallel out-of-core matrix transposition

International Journal of High Performance Computing and Networking
Efficient search-space pruning for integrated fusion and tiling transformations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Journal of Parallel and Distributed Computing
Empirical performance-model driven data layout optimization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Supporting XML based high-level abstractions on HDF5 datasets: a case study in automatic data virtualization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Memory-constrained communication minimization for a class of array computations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient layout transformation for disk-based multidimensional arrays

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A framework for load balancing of tensor contraction expressions via dynamic task partitioning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The accurate modeling of the electronic structure of atoms and molecules is very computationally intensive. Many models of electronic structure, such as the Coupled Cluster approach, involve collections of tensor contractions. There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number of arithmetic operations. In this paper, we present an algorithm that starts with an operation-minimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost that fits within a specified memory limit. Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.