To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Authors:
O. Temam;E. D. Granston;W. Jalby
Affiliations:
High Perfomace Computing Division, Department of Computer Science, Leiden University, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands;High Perfomace Computing Division, Department of Computer Science, Leiden University, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands;University de Versailles, 78000 Versailles, France
Venue:
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Year:
1993

Citing 8
Cited 68

On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Alpha architecture reference manual

Alpha architecture reference manual
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing

MOB forms: a class of multilevel block algorithms for dense linear algebra operations

ICS '94 Proceedings of the 8th international conference on Supercomputing
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Data relocation and prefetching for programs with large data sets

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Skewed associativity enhances performance predictability

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Unified compilation techniques for shared and distributed address space machines

ICS '95 Proceedings of the 9th international conference on Supercomputing
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Non-singular data transformations: definition, validity and applications

ICS '97 Proceedings of the 11th international conference on Supercomputing
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems

IEEE Transactions on Parallel and Distributed Systems
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A tile selection algorithm for data locality and cache interference

ICS '99 Proceedings of the 13th international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonsingular Data Transformations: Definition, Validity, and Applications

International Journal of Parallel Programming
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Architecture-cognizant divide and conquer algorithms

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Matrix multiplication: a case study of enhanced data cache utilization

Journal of Experimental Algorithmics (JEA)
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting non-uniform reuse for cache optimization

Proceedings of the 2001 ACM symposium on Applied computing
The Impulse Memory Controller

IEEE Transactions on Computers
Register tiling in nonrectangular iteration spaces

ACM Transactions on Programming Languages and Systems (TOPLAS)
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Skewed Associativity Improves Program Performance and Enhances Predictability

IEEE Transactions on Computers
A Modal Model of Memory

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
False Sharing Elimination by Selection of Runtime Scheduling Parameters

ICPP '97 Proceedings of the international Conference on Parallel Processing
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Cache Remapping to Improve the Performance of Tiled Algorithms

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Fractal Matrix Multiplication: A Case Study on Portability of Cache Performance

WAE '01 Proceedings of the 5th International Workshop on Algorithm Engineering
Cache Line Impact on 3D PDE Solvers

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Array Unification: A Locality Optimization Technique

CC '01 Proceedings of the 10th International Conference on Compiler Construction
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Data cache locking for higher program predictability

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

IEEE Transactions on Computers
Data Caches in Multitasking Hard Real-Time Systems

RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
A fast and accurate framework to analyze and optimize cache memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Quasidynamic Layout Optimizations for Improving Data Locality

IEEE Transactions on Parallel and Distributed Systems
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Locality-conscious workload assignment for array-based computations in MPSOC architectures

Proceedings of the 42nd annual Design Automation Conference
An accurate cost model for guiding data locality transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-directed high-level energy estimation and optimization

ACM Transactions on Embedded Computing Systems (TECS)
Automatic benchmark generation for cache optimization of matrix operations

ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Empirical optimization for a sparse linear solver: a case study

International Journal of Parallel Programming - Special issue: The next generation software program
Impulse: Memory system support for scientific applications

Scientific Programming
A memory-conscious code parallelization scheme

Proceedings of the 44th annual Design Automation Conference
Exploring the performance limits of simultaneous multithreading for memory intensive applications

The Journal of Supercomputing
Fast indexing for blocked array layouts to reduce cache misses

International Journal of High Performance Computing and Networking
Dynamic tiling for effective use of shared caches on multithreaded processors

International Journal of High Performance Computing and Networking
Revisiting Cache Block Superloading

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Dependence-based code generation for a CELL processor

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Using non-canonical array layouts in dense matrix operations

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
New data structures for matrices and specialized inner kernels: low overhead for high performance

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness

CASCON First Decade High Impact Papers
A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Applying data copy to improve memory performance of general array computations

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Optimizing matrix multiplication with a classifier learning system

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Tuning blocked array layouts to exploit memory hierarchy in SMT architectures

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Loop transformation recipes for code generation and auto-tuning

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Cache-sensitive MapReduce DGEMM algorithms for shared memory architectures

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Tile size selection revisited

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Quantified Score

Visualization

Abstract