LAPACK's user's guide
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A MATLAB to Fortran 90 translator and its effectiveness
ICS '96 Proceedings of the 10th international conference on Supercomputing
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A case for source-level transformations in MATLAB
Proceedings of the 2nd conference on Domain-specific languages
BILUTM: A Domain-Based Multilevel Block ILUT Preconditioner for General Sparse Matrices
SIAM Journal on Matrix Analysis and Applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
MaJIC: compiling MATLAB for speed and responsiveness
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A high-level approach to synthesis of high-performance codes for quantum chemistry
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Automated Multilevel Substructuring Method for Eigenspace Computation in Linear Elastodynamics
SIAM Journal on Scientific Computing
Reusable, generic program analyses and transformations
GPCE '09 Proceedings of the eighth international conference on Generative programming and component engineering
Hi-index | 0.00 |
An important emerging problem domain in computational science and engineering is the development of multi-scale computational methods for complex problems in mechanics that span multiple spatial and temporal scales. An attractive approach to solving these problems is recursive decomposition: the problem is broken up into a tree of loosely coupled sub-problems which can be solved independently and then coupled back together to obtain the desired solution. However, a particular problem can be solved in myriad ways by coupling the sub-problems together in different tree orders. As we argue in this paper, the space of possible orders is vast, the performance gap between an arbitrary order and the best order is potentially quite large, and the likelihood that a domain scientist can find the best order to solve a problem on a particular machine is vanishingly small. In this paper, we present a system that uses domain-specific knowledge captured in computational libraries to optimize code written in a conventional language (C). The system generates efficient coupling orders to solve computational mechanics problems using recursive decomposition. Our system adopts the inspector-executor paradigm, where the problem is inspected and a novel heuristic finds an effective implementation based on domain properties evaluated by a cost model. The derived implementation is then executed by a parallel run-time system (Cilk) which achieves optimal parallel performance. We demonstrate that our cost model is highly correlated with actual application runtime, that our proposed technique outperforms non-decomposed and non-multiscale methods. The code generated by the heuristic also outperforms alternate scheduling strategies, as well as over 99% of randomly-generated recursive decompositions sampled from the space of possible solutions.