A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler-Directed Code Restructuring for Improving Performance of MPSoCs
IEEE Transactions on Parallel and Distributed Systems
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
A scalable and near-optimal representation of access schemes for memory management
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Unlike desktop and server CPUs, special-purpose processors found in embedded systems and on graphics cards often do not have a cache memory which is managed automatically by hardware logic. Instead, they offer a so-called scratchpad memory which is fast like a cache but, unlike a cache, has to be managed explicitly, i.e., the burden of its efficient use is imposed on the software. We present a method for computing precisely which memory cells are reused due to temporal locality of a certain class of codes, namely codes which can be modelled in the well-known polyhedron model. We present some examples demonstrating the effectiveness of our method for scientific codes.