Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

Authors:
Daniel Cociorva;J. W. Wilkins;Gerald Baumgartner;P. Sadayappan;J. Ramanujam;Marcel Nooijen;David E. Bernholdt;Robert J. Harrison
Affiliations:
-;-;-;-;-;-;-;-
Venue:
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Year:
2001

Citing 25
Cited 13

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures

Circuits, Systems, and Signal Processing
Compiler cache optimizations for banded matrix problems

ICS '95 Proceedings of the 9th international conference on Supercomputing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
Precise miss analysis for program transformations with caches of arbitrary associativity

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An experimental evaluation of tiling and shackling for memory hierarchy management

ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantifying the multi-level nature of tiling interactions

International Journal of Parallel Programming
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Proceedings of the 14th international conference on Supercomputing
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
SPL: a language and compiler for DSP algorithms

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Tuning Strassen's matrix multiplication for memory efficiency

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Memory-Optimal Evaluation of Expression Trees Involving Large Objects

HiPC '99 Proceedings of the 6th International Conference on High Performance Computing
Optimal Reordering and Mapping of a Class of Nested-Loops for Parallel Execution

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Performance optimization of a class of loops implementing multidimensional integrals

Performance optimization of a class of loops implementing multidimensional integrals

Space-time trade-off optimization for a class of electronic structure calculations

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A Performance Optimization Framework for Compilation of Tensor Contraction Expressions into Parallel Programs

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A high-level approach to synthesis of high-performance codes for quantum chemistry

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Cache Miss Characterization and Data Locality Optimization for Imperfectly Nested Loops on Shared Memory Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Performance modeling and optimization of parallel out-of-core tensor contractions

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Layout transformation support for the disk resident arrays framework

The Journal of Supercomputing
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Efficient parallel out-of-core matrix transposition

International Journal of High Performance Computing and Networking
Memory-optimal evaluation of expression trees involving large objects

Computer Languages, Systems and Structures
Efficient search-space pruning for integrated fusion and tiling transformations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Memory-constrained communication minimization for a class of array computations

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient layout transformation for disk-based multidimensional arrays

HiPC'04 Proceedings of the 11th international conference on High Performance Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The goal of our project is the development of a program synthesis system to facilitate the development of high-performance parallel programs for a class of computations encountered in computational chemistry and computational physics. These computations are expressible as a set of tensor contractions and arise in electronic structure calculations. This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures. We focus on an approach to performing data locality optimization in this context. Preliminary experimental results on an SGI Origin 2000 are encouraging and demonstrate that the approach is effective.