Expressing Boolean cube matrix algorithms in shared memory primitives
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Matrix multiplication on the connection machine
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Communication efficient matrix multiplication on hypercubes
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Graph Problems on a Mesh-Connected Processor Array
Journal of the ACM (JACM)
Associative and Parallel Processors
ACM Computing Surveys (CSUR)
Program transformation and runtime support for threaded MPI execution on shared-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Matrix Multiplication on the OTIS-Mesh Optoelectronic Computer
IEEE Transactions on Computers
Self-Routing Technique in Perfect-Shuffle Networks Using Control Tags
IEEE Transactions on Computers
Generation of Injective and Reversible Modular Mappings
IEEE Transactions on Parallel and Distributed Systems
NAS Experiences of Porting CM Fortran Codes to on IBM SP2 and SGI Power Challenge
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Parallel Computation: MM +/- X
Informatics - 10 Years Back. 10 Years Ahead.
Heterogeneous Networks of Workstations and the Parallel Matrix Multiplication
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Architectures for an Efficient Application Execution in a Collection of HNOWS
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hyper-systolic algorithms with applications in linear algebra and molecular dynamics
Highly parallel computaions
Parallel Complexity of Matrix Multiplication
The Journal of Supercomputing
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Graph problems on a mesh-connected processor array (Preliminary Version)
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
An efficient VLSI dictionary machine
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Matrix Multiplications on the Memory_based Processor Array
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A New Parallel Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A Flexible Class of Parallel Matrix Multiplication Algorithms
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Application-Specific Scheduling for the Organic Grid
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
The Hierarchically Tiled Arrays programming approach
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Memory efficient parallel matrix multiplication operation for irregular problems
Proceedings of the 3rd conference on Computing frontiers
Parallel algorithms development for programmable logic devices
Advances in Engineering Software
IEEE Transactions on Parallel and Distributed Systems
A Self-Routing Benes Network and Parallel Permutation Algorithms
IEEE Transactions on Computers
Recursive Methods for Matrix Inversion in Pattern Recognition Environments
IEEE Transactions on Computers
Scientific computing Kernels on the cell processor
International Journal of Parallel Programming
Matrix product on heterogeneous master-worker platforms
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Performance without pain = productivity: data layout and collective communication in UPC
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
EURASIP Journal on Embedded Systems
Computationally efficient parallel matrix-matrix multiplication on the torus
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Performance evaluation of basic linear algebra subroutines on a matrix co-processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Fine tuning matrix multiplications on multicore
HiPC'08 Proceedings of the 15th international conference on High performance computing
Algorithmic issues in grid computing
Algorithms and theory of computation handbook
Optimized dense matrix multiplication on a many-core architecture
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
The general matrix multiply-add operation on 2D torus
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Batch Mode Active Learning for Networked Data
ACM Transactions on Intelligent Systems and Technology (TIST)
Journal of Parallel and Distributed Computing
Empirical performance-model driven data layout optimization
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Implementation of parallel numerical algorithms using hierarchically tiled arrays
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A high efficient on-chip interconnection network in SIMD CMPs
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
An efficient scheduler of RTOS for multi/many-core system
Computers and Electrical Engineering
Communication-optimal parallel algorithm for strassen's matrix multiplication
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Communication avoiding and overlapping for numerical linear algebra
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Communication-avoiding parallel strassen: implementation and performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures
Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Avoiding communication through a multilevel LU factorization
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Fast parallel algorithms for blocked dense matrix multiplication on shared memory architectures
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Work-efficient matrix inversion in polylogarithmic time
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Communication optimal parallel multiplication of sparse random matrices
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
uBench: exposing the impact of CUDA block geometry in terms of performance
The Journal of Supercomputing
Communication costs of Strassen's matrix multiplication
Communications of the ACM
Hi-index | 0.03 |