ACM Transactions on Computer Systems (TOCS)
Memory storage patterns in parallel processing
Memory storage patterns in parallel processing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The high performance Fortran handbook
The high performance Fortran handbook
Automatic data partitioning on distributed memory multicomputers
Automatic data partitioning on distributed memory multicomputers
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimal evaluation of array expressions on massively parallel machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply
ACM Transactions on Programming Languages and Systems (TOPLAS)
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic Partitioning of Non-Uniform Structured Workloads with Spacefilling Curves
IEEE Transactions on Parallel and Distributed Systems
Can parallel algorithms enhance serial implementation?
Communications of the ACM
The influence of caches on the performance of heaps
Journal of Experimental Algorithmics (JEA)
Analysis of the clustering properties of Hilbert space-filling curve
Analysis of the clustering properties of Hilbert space-filling curve
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
High performance Fortran for highly irregular problems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Advanced compiler design and implementation
Advanced compiler design and implementation
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Tuning Strassen's matrix multiplication for memory efficiency
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
Hierarchical tiling for improved superscalar performance
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Caches and algorithms
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Design and evaluation of a linear algebra package for Java
Proceedings of the ACM 2000 conference on Java Grande
Towards a theory of cache-efficient algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Transforming loops to recursion for multi-level memory hierarchies
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Communications of the ACM
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
Cache oblivious search trees via binary trees of small height
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
International Journal of Parallel Programming
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Automatic Generation of Block-Recursive Codes
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Is Morton Layout Competitive for Large Two-Dimensional Arrays?
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance
WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Parallel Wavelet Transform for Large Scale Image Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Data cache locking for higher program predictability
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
IEEE Transactions on Parallel and Distributed Systems
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
A Novel Implementation of Tile-Based Address Mapping
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
Mesh Partitioning Approach to Energy Efficient Data Layout
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Optimizing Graph Algorithms for Improved Cache Performance
IEEE Transactions on Parallel and Distributed Systems
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache-Conscious Automata for XML Filtering
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
The Opie compiler from row-major source to Morton-ordered matrices
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A fully portable high performance minimal storage hybrid format Cholesky algorithm
ACM Transactions on Mathematical Software (TOMS)
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Cache-Efficient Multigrid Algorithms
International Journal of High Performance Computing Applications
An accurate cost model for guiding data locality transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache-Conscious Automata for XML Filtering
IEEE Transactions on Knowledge and Data Engineering
NINJA: Java for high performance numerical computing
Scientific Programming
Fast indexing for blocked array layouts to reduce cache misses
International Journal of High Performance Computing and Networking
Towards many-core implementation of LU decomposition using Peano Curves
Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
Wavelet transform for large scale image processing on modern microprocessors
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Using non-canonical array layouts in dense matrix operations
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Evaluating ISA support and hardware support for recursive data layouts
HiPC'07 Proceedings of the 14th international conference on High performance computing
New data structures for matrices and specialized inner kernels: low overhead for high performance
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Porting existing cache-oblivious linear algebra HPC modules to larrabee architecture
Proceedings of the 7th ACM international conference on Computing frontiers
ACM Transactions on Algorithms (TALG)
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A study on load imbalance in parallel hypermatrix multiplication using OpenMP
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A cache oblivious algorithm for matrix multiplication based on peano's space filling curve
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Adapting linear algebra codes to the memory hierarchy using a hypermatrix scheme
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Minimizing associativity conflicts in morton layout
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Towards data tiling for whole programs in scratchpad memory allocation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Dual-addressing memory architecture for two-dimensional memory access patterns
Proceedings of the Conference on Design, Automation and Test in Europe
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.02 |