A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Using PLAPACK: parallel linear algebra package
Using PLAPACK: parallel linear algebra package
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Elemental: A New Framework for Distributed Memory Dense Matrix Computations
ACM Transactions on Mathematical Software (TOMS)
A case study in mechanically deriving dense linear algebra code
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Many dense linear algebra (DLA) operations are easy to understand at a high level and users get functional DLA code on new hardware relatively quickly. As a result, many people consider DLA to be a "solved domain." The truth is that DLA is not solved. DLA experts are rare because the "tricks" and variety of algorithms they need to get high performance take time to learn. DLA implementations are only available on a new architecture when an expert with enough experience goes through a rote process to implement many related DLA operations. While so much of the manual work is rote, this hardly suggests the domain is "solved." We have not proven that we understand the field until we have automated the expert. Automate the expert for the entire field, and the field is closed. We view that goal as the equivalent of going to Mars. In practice, we will get to the moon automatically, and experts will then be freed up to worry about how to get from there to Mars.