The generalized basis reduction algorithm
Mathematics of Operations Research
Handbook of combinatorics (vol. 1)
Parallel Computing - Special issue on applications: parallel processing and multimedia
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Compaan: deriving process networks from Matlab for embedded signal processing architectures
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Optimizing memory usage in the polyhedral model
ACM Transactions on Programming Languages and Systems (TOPLAS)
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Constructing and exploiting linear schedules with prescribed parallelism
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Storage Size Reduction by In-place Mapping of Arrays
VMCAI '02 Revised Papers from the Third International Workshop on Verification, Model Checking, and Abstract Interpretation
Lattice-based memory allocation
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fast memory footprint estimation based on maximal dependency vector calculation
Proceedings of the conference on Design, automation and test in Europe
Mapping multi-dimensional signals into hierarchical memory organizations
Proceedings of the conference on Design, automation and test in Europe
A step towards unifying schedule and storage optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
pn: a tool for improved derivation of process networks
EURASIP Journal on Embedded Systems
Computation of storage requirements for multi-dimensional signal processing applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Mapping model with inter-array memory sharing for multidimensional signal processing
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Integrated Computer-Aided Engineering
A domain specific interconnect for reconfigurable computing
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Journal of Signal Processing Systems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications
Journal of Signal Processing Systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Periodic register saturation in innermost loops
Parallel Computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Model-based synthesis and optimization of static multi-rate image processing algorithms
Proceedings of the Conference on Design, Automation and Test in Europe
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Signal Assignment Model for the Memory Management of Multidimensional Signal Processing Applications
Journal of Signal Processing Systems
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Application specific memory access, reuse and reordering for SDRAM
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Optimizing SDRAM bandwidth for custom FPGA loop accelerators
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Analytical synthesis of bandwidth-efficient SDRAM address generators
Microprocessors & Microsystems
Polyhedral-based data reuse optimization for configurable computing
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Memory reuse optimizations in the R-Stream compiler
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA
Proceedings of the Conference on Design, Automation and Test in Europe
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A scalable and near-optimal representation of access schemes for memory management
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 14.98 |
We investigate the problem of memory reuse in order to reduce the memory needed to store an array variable. We develop techniques that can lead to smaller memory requirements in the synthesis of dedicated processors or to more effective use by compiled code of software-controlled scratchpad memory. Memory reuse is well-understood for allocating registers to hold scalar variables. Its extension to arrays has been studied recently for multimedia applications, for loop parallelization, and for circuit synthesis from recurrence equations. In all such studies, the introduction of modulo operations to an otherwise affine mapping (of loop or array indices to memory locations) achieves the desired reuse. We develop here a new mathematical framework, based on critical lattices, that subsumes the previous approaches and provides new insight. We first consider the set of indices that conflict, those that cannot be mapped to the same memory cell. Next, we construct the set of differences of conflicting indices. We establish a correspondence between a valid modular mapping and a strictly admissible integer lattice驴one having no nonzero element in common with the set of conflicting index differences. The memory required by an optimal modular mapping is equal to the determinant of the corresponding lattice. The memory reuse problem is thus reduced to the (still interesting and nontrivial) problem of finding a strictly admissible integer lattice of least determinant. We then propose and analyze several practical strategies for finding strictly admissible integer lattices, either optimal or optimal up to a multiplicative factor, and, hence, memory-saving modular mappings. We explain and analyze previous approaches in terms of our new framework.