ACM Transactions on Mathematical Software (TOMS)
A three-dimensional approach to parallel matrix multiplication
IBM Journal of Research and Development
Algorithmic Redistribution Methods for Block-Cyclic Decompositions
IEEE Transactions on Parallel and Distributed Systems
Matrix Multiplication on Heterogeneous Platforms
IEEE Transactions on Parallel and Distributed Systems
A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)
IEEE Transactions on Computers
Dense linear algebra kernels on heterogeneous platforms: redistribution issues
Parallel Computing - Parallel matrix algorithms and applications
Parallel Factorizations with Algorithmic Blocking
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Parallel Complexity of Matrix Multiplication
The Journal of Supercomputing
Fault tolerant matrix operations using checksum and reverse computation
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
A New Parallel Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Matrix-Matrix Multiplication on Heterogeneous Platforms
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Flexible Class of Parallel Matrix Multiplication Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Logic-based eDRAM: origins and rationale for use
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Static LU Decomposition on Heterogeneous Platforms
International Journal of High Performance Computing Applications
Memory efficient parallel matrix multiplication operation for irregular problems
Proceedings of the 3rd conference on Computing frontiers
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling
ACM Transactions on Mathematical Software (TOMS)
Implementing a parallel matrix factorization library on the cell broadband engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
The general matrix multiply-add operation on 2D torus
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Cache blocking for linear algebra algorithms
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Toward scalable matrix multiply on multithreaded architectures
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.01 |