A framework for load balancing of tensor contraction expressions via dynamic task partitioning

Authors:
Pai-Wei Lai;Kevin Stock;Samyam Rajbhandari;Sriram Krishnamoorthy;P. Sadayappan
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Pacific Northwest National Laboratory, Richland, WA;The Ohio State University, Columbus, OH
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 13
Cited 0

Graph algorithms and NP-completeness

Graph algorithms and NP-completeness
Layered drawings of digraphs

Drawing graphs
Loop optimization for a class of memory-constrained computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Space-time trade-off optimization for a class of electronic structure calculations

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Scioto: A Framework for Global-View Task Parallelism

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable implementations of accurate excited-state coupled cluster theories: application of high-level methods to porphyrin-based systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Identifying cost-effective common subexpressions to reduce operation count in tensor contraction evaluations

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
Automated operation minimization of tensor contraction expressions in electronic structure calculations

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Work stealing and persistence-based load balancers for iterative overdecomposed applications

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Inspector/executor load balancing algorithms for block-sparse tensor contractions

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing runtime. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from Coupled Cluster (CC) methods.