LAPACK's user's guide
Approximate algorithms scheduling parallelizable tasks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A Strip-Packing Algorithm with Absolute Performance Bound 2
SIAM Journal on Computing
ScaLAPACK user's guide
Efficient approximation algorithms for scheduling malleable tasks
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scheduling malleable and nonmalleable parallel tasks
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A 5/4-approximation algorithm for scheduling identical malleable tasks
Theoretical Computer Science - Approximation and online algorithms
A $\frac32$-Approximation Algorithm for Scheduling Independent Monotonic Malleable Tasks
SIAM Journal on Computing
Hi-index | 0.00 |
The application of High Performance Computing to Quantum Chemical (QC) calculations faces many challenges. A central step is the solution of the generalized eigenvalue problem of a Hamilton matrix. Although in many cases its execution time is small relative to other numerical tasks, its complexity of O(N3) is higher, thus more significant in larger applications. For parallel QC codes, it therefore is advantageous to have a scalable solver for this step. We investigate the case where the symmetry of a molecule leads to a block-diagonal matrix structure, which complicates an efficient use of available parallel eigensolvers. We present a technique which employs a malleable parallel task scheduling (MPTS) algorithm to schedule instances of sequential and parallel eigensolver routines from LAPACK and ScaLAPACK. In this way, an efficient use of hardware resources is guaranteed while overall scalability is facilitated. Finally, we evaluate the proposed technique for electronic structure calculations of real chemical systems. For the systems considered, the performance was improved by factors of up to 8.4, compared to the previously used, nonmalleable parallel scheduling approach.