Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Compilation and delayed evaluation in APL
POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Memory-Optimal Evaluation of Expression Trees Involving Large Objects
HiPC '99 Proceedings of the 6th International Conference on High Performance Computing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Loop Alignment for Memory Accesses Optimization
Proceedings of the 12th international symposium on System synthesis
Performance optimization of a class of loops implementing multidimensional integrals
Performance optimization of a class of loops implementing multidimensional integrals
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Compiler algorithms for efficient use of memory systems
Compiler algorithms for efficient use of memory systems
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Static array storage optimization in MATLAB
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Comparing various parallelizing approaches for tribology simulations
High performance scientific and engineering computing
Experiments with Parallelizing Tribology Simulations
The Journal of Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Data Centric Transformations on Non-Integer Iteration Spaces
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Layout transformation support for the disk resident arrays framework
The Journal of Supercomputing
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Efficient parallel out-of-core matrix transposition
International Journal of High Performance Computing and Networking
Efficient search-space pruning for integrated fusion and tiling transformations
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Journal of Parallel and Distributed Computing
Empirical performance-model driven data layout optimization
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Memory-constrained communication minimization for a class of array computations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient layout transformation for disk-based multidimensional arrays
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A framework for load balancing of tensor contraction expressions via dynamic task partitioning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The accurate modeling of the electronic structure of atoms and molecules is very computationally intensive. Many models of electronic structure, such as the Coupled Cluster approach, involve collections of tensor contractions. There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number of arithmetic operations. In this paper, we present an algorithm that starts with an operation-minimal form of the computation and systematically explores the possible space-time trade-offs to identify the form with lowest cost that fits within a specified memory limit. Its utility is demonstrated by applying it to a computation representative of a component in the CCSD(T) formulation in the NWChem quantum chemistry suite from Pacific Northwest National Laboratory.