Performance modeling and optimization of parallel out-of-core tensor contractions

Authors:
Xiaoyang Gao;Swarup Kumar Sahoo;Chi-Chung Lam;J. Ramanujam;Qingda Lu;Gerald Baumgartner;P. Sadayappan
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University;Louisiana State University;The Ohio State University;Louisiana State University;The Ohio State University
Venue:
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2005

Citing 14
Cited 1

A model and compilation strategy for out-of-core data parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication strategies for out-of-core programs on distributed memory machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Automatic parallel I/O performance optimization in Panda

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Compilation techniques for out-of-core parallel computations

Parallel Computing - Special issues on languages and compilers for parallel computers
A survey of out-of-core algorithms in numerical linear algebra

External memory algorithms
A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations

IEEE Transactions on Parallel and Distributed Systems
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets

The Journal of Supercomputing
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A high-level approach to synthesis of high-performance codes for quantum chemistry

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Techniques for compiling i/o intensive parallel programs

Techniques for compiling i/o intensive parallel programs

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Tensor Contraction Engine (TCE) is a domain-specific compiler for implementing complex tensor contraction expressions arising in quantum chemistry applications modeling electronic structure. This paper develops a performance model for tensor contractions, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain. Experimental results are provided that demonstrate the accuracy and effectiveness of the model.