A compiler approach to managing storage and memory bandwidth in configurable architectures

Authors:
Nastaran Baradaran;Pedro C. Diniz
Affiliations:
University of Southern California/Information Sciences Institute, Los Angeles, California;Instituto Superior Técnico/Technical University of Lisbon/INESC-ID
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2008

Citing 17
Cited 4

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Minimizing the required memory bandwidth in VLSI system realizations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Compiler-directed scratch pad memory hierarchy design and management

Proceedings of the 39th annual Design Automation Conference
Storage Management Programmable Process

Storage Management Programmable Process
Automatic Allocation of Arrays to Memories in FPGA Processors with Multiple Memory Banks

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Custom Data Layout for Memory Parallelism

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Storage requirement estimation for optimized design of data intensive applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Input data reuse in compiling window operations onto reconfigurable hardware

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine-Grain Configurable Architectures

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Analyzing data reuse for cache reconfiguration

ACM Transactions on Embedded Computing Systems (TECS)
Storage assignment during high-level synthesis for configurable architectures

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Compiler directed data management for configurable architectures with heterogeneous memory structures

Compiler directed data management for configurable architectures with heterogeneous memory structures
Extending the applicability of scalar replacement to multiple induction variables

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Automatic memory partitioning and scheduling for throughput and power optimization

Proceedings of the 2009 International Conference on Computer-Aided Design
Finding the best compromise in compiling compound loops to Verilog

Journal of Systems Architecture: the EUROMICRO Journal
Automatic memory partitioning and scheduling for throughput and power optimization

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Configurable architectures offer the unique opportunity of realizing hardware designs tailored to the specific data and computational patterns of an application code. Customizing the storage structures is becoming increasingly important in mitigating the continuing gap between memory latencies and internal computing speeds. In this article we describe and evaluate a compiler algorithm that maps the arrays of a loop-based computation to internal storage structures, either RAM blocks or discrete registers. Our objective is to minimize the overall execution time while considering the capacity and bandwidth constraints of the storage resources. The novelty of our approach lies in creating a single framework that combines high-level compiler techniques with lower-level scheduling information for mapping the data. We illustrate the benefits of our approach for a set of image/signal processing kernels using a Xilinx Virtex™ Field-Programmable Gate Array (FPGA). Our algorithm leads to faster designs compared to the state-of-the-art custom data layout mapping technique, in some instances using less storage. When compared to hand-coded designs, our results are comparable in terms of execution time and resources, but are derived in a minute fraction of the design time.