Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The high performance Fortran handbook
The high performance Fortran handbook
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimal evaluation of array expressions on massively parallel machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves
IEEE Transactions on Parallel and Distributed Systems
Analysis of the clustering properties of Hilbert space-filling curve
Analysis of the clustering properties of Hilbert space-filling curve
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
High performance Fortran for highly irregular problems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Digital Design
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Efficient Procedures for Using Matrix Algorithms
Proceedings of the 2nd Colloquium on Automata, Languages and Programming
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Design and evaluation of a linear algebra package for Java
Proceedings of the ACM 2000 conference on Java Grande
Towards a theory of cache-efficient algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
Computation regrouping: restructuring programs for temporal data cache locality
ICS '02 Proceedings of the 16th international conference on Supercomputing
Global static indexing for real-time exploration of very large regular grids
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
Mixed Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Recursion Unrolling for Divide and Conquer Programs
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Parallel Complexity of Matrix Multiplication
The Journal of Supercomputing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
IEEE Transactions on Parallel and Distributed Systems
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
SFCGen: A framework for efficient generation of multi-dimensional space-filling curves by recursion
ACM Transactions on Mathematical Software (TOMS)
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
ACM Transactions on Programming Languages and Systems (TOPLAS)
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Representation-transparent matrix algorithms with scalable performance
Proceedings of the 21st annual international conference on Supercomputing
Adaptive Strassen's matrix multiplication
Proceedings of the 21st annual international conference on Supercomputing
Mapping with Space Filling Surfaces
IEEE Transactions on Parallel and Distributed Systems
Compact Hilbert indices: Space-filling curves for domains with unequal side lengths
Information Processing Letters
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
On the limits of cache-oblivious rational permutations
Theoretical Computer Science
Using non-canonical array layouts in dense matrix operations
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Evaluating ISA support and hardware support for recursive data layouts
HiPC'07 Proceedings of the 14th international conference on High performance computing
ACM Transactions on Algorithms (TALG)
A study on load imbalance in parallel hypermatrix multiplication using OpenMP
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Minimizing associativity conflicts in morton layout
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Compiler-optimized kernels: an efficient alternative to hand-coded inner kernels
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
A cache-aware algorithm for PDEs on hierarchical data structures
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Cache-Oblivious algorithms and matrix formats for computations on interval matrices
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Hi-index | 0.00 |