An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
LAPACK's user's guide
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch
IBM Journal of Research and Development
POWER2: next generation of the RISC System/6000 family
IBM Journal of Research and Development
POWER2 fixed-point, data cache, and storage control units
IBM Journal of Research and Development
POWER2 floating-point unit: architecture and implementation
IBM Journal of Research and Development
IBM Journal of Research and Development
Instruction scheduling in the TOBEY compiler
IBM Journal of Research and Development
Design considerations for the PowerPC 601 microprocessor
IBM Journal of Research and Development
Implementation of the PowerPC 601 microprocessor
IBM Journal of Research and Development
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Benchmark Evaluation of the IBM SP2 for Parallel Signal Processing
IEEE Transactions on Parallel and Distributed Systems
A super scalar sort algorithm for RISC processors
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
ACM Transactions on Mathematical Software (TOMS)
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark
ACM Transactions on Mathematical Software (TOMS)
The RISC BLAS: a blocked implementation of level 3 BLAS for RISC processors
ACM Transactions on Mathematical Software (TOMS)
A recursive formulation of Cholesky factorization of a matrix in packed storage
ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
A Family of High-Performance Matrix Multiplication Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
LAWRA Workshop: Linear Algebra with Recursive Algorithms: http: //lawra.uni-c.dk/lawra/
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
High-Performance Library Software for QR Factorization
PARA '00 Proceedings of the 5th International Workshop on Applied Parallel Computing, New Paradigms for HPC in Industry and Academia
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
Parallel and fully recursive multifrontal sparse Cholesky
Future Generation Computer Systems - Special issue: Selected numerical algorithms
Solving unsymmetric sparse systems of linear equations with PARDISO
Future Generation Computer Systems - Special issue: Selected numerical algorithms
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
A fully portable high performance minimal storage hybrid format Cholesky algorithm
ACM Transactions on Mathematical Software (TOMS)
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
A unified model for multicore architectures
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Implementing a parallel matrix factorization library on the cell broadband engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Cache-optimal algorithms for option pricing
ACM Transactions on Mathematical Software (TOMS)
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Fast pseudorandom-number generators with modulus 2k or 2k-1 using fused multiply-add
IBM Journal of Research and Development
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
IBM Journal of Research and Development
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion
ACM Transactions on Mathematical Software (TOMS)
Minimal data copy for dense linear algebra factorization
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
In-place transposition of rectangular matrices
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Using non-canonical array layouts in dense matrix operations
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Is cache-oblivious DGEMM viable?
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
New data structures for matrices and specialized inner kernels: low overhead for high performance
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A family of high-performance matrix multiplication algorithms
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Upper and lower I/O bounds for pebbling r-pyramids
Journal of Discrete Algorithms
The Journal of Supercomputing
Cache blocking for linear algebra algorithms
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Hi-index | 0.00 |