DXML: a high-performance scientific subroutine library

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Compiler blockability of dense matrix factorizations

ACM Transactions on Mathematical Software (TOMS)
Communication overlap in multi-tier parallel algorithms

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky

ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Parallel and fully recursive multifrontal sparse Cholesky

Future Generation Computer Systems - Special issue: Selected numerical algorithms
Communication lower bounds for distributed-memory matrix multiplication

Journal of Parallel and Distributed Computing
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V

Hi-index	0.00