A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Minimizing the required memory bandwidth in VLSI system realizations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Compiler-directed scratch pad memory hierarchy design and management
Proceedings of the 39th annual Design Automation Conference
Storage Management Programmable Process
Storage Management Programmable Process
Automatic Allocation of Arrays to Memories in FPGA Processors with Multiple Memory Banks
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Storage requirement estimation for optimized design of data intensive applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Input data reuse in compiling window operations onto reconfigurable hardware
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Analyzing data reuse for cache reconfiguration
ACM Transactions on Embedded Computing Systems (TECS)
Storage assignment during high-level synthesis for configurable architectures
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Compiler directed data management for configurable architectures with heterogeneous memory structures
Extending the applicability of scalar replacement to multiple induction variables
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatic memory partitioning and scheduling for throughput and power optimization
Proceedings of the 2009 International Conference on Computer-Aided Design
Finding the best compromise in compiling compound loops to Verilog
Journal of Systems Architecture: the EUROMICRO Journal
Automatic memory partitioning and scheduling for throughput and power optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Configurable architectures offer the unique opportunity of realizing hardware designs tailored to the specific data and computational patterns of an application code. Customizing the storage structures is becoming increasingly important in mitigating the continuing gap between memory latencies and internal computing speeds. In this article we describe and evaluate a compiler algorithm that maps the arrays of a loop-based computation to internal storage structures, either RAM blocks or discrete registers. Our objective is to minimize the overall execution time while considering the capacity and bandwidth constraints of the storage resources. The novelty of our approach lies in creating a single framework that combines high-level compiler techniques with lower-level scheduling information for mapping the data. We illustrate the benefits of our approach for a set of image/signal processing kernels using a Xilinx Virtex™ Field-Programmable Gate Array (FPGA). Our algorithm leads to faster designs compared to the state-of-the-art custom data layout mapping technique, in some instances using less storage. When compared to hand-coded designs, our results are comparable in terms of execution time and resources, but are derived in a minute fraction of the design time.