A model and compilation strategy for out-of-core data parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication strategies for out-of-core programs on distributed memory machines
ICS '95 Proceedings of the 9th international conference on Supercomputing
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Automatic optimization of communication in compiling out-of-core stencil codes
ICS '96 Proceedings of the 10th international conference on Supercomputing
Automatic compiler-inserted I/O prefetching for out-of-core applications
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Automatic parallel I/O performance optimization in Panda
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Compilation techniques for out-of-core parallel computations
Parallel Computing - Special issues on languages and compilers for parallel computers
A survey of out-of-core algorithms in numerical linear algebra
External memory algorithms
IEEE Transactions on Parallel and Distributed Systems
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Techniques for compiling i/o intensive parallel programs
Techniques for compiling i/o intensive parallel programs
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The Tensor Contraction Engine (TCE) is a domain-specific compiler for implementing complex tensor contraction expressions arising in quantum chemistry applications modeling electronic structure. This paper develops a performance model for tensor contractions, considering both disk I/O as well as inter-processor communication costs, to facilitate performance-model driven loop optimization for this domain. Experimental results are provided that demonstrate the accuracy and effectiveness of the model.