Graph algorithms and NP-completeness
Graph algorithms and NP-completeness
Drawing graphs
Loop optimization for a class of memory-constrained computations
ICS '01 Proceedings of the 15th international conference on Supercomputing
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit
International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Scioto: A Framework for Global-View Task Parallelism
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Inspector/executor load balancing algorithms for block-sparse tensor contractions
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing runtime. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from Coupled Cluster (CC) methods.