Compiler directed data management for configurable architectures with heterogeneous memory structures

  • Authors:
  • Pedro C. Diniz;Nastaran Baradaran

  • Affiliations:
  • University of Southern California;University of Southern California

  • Venue:
  • Compiler directed data management for configurable architectures with heterogeneous memory structures
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Configurable architectures offer the unique opportunity of realizing hardware designs tailored to the specific data and computational patterns of a given application code. These devices have customizable compute fabric, interconnects, and memory subsystems that allow for large amounts of data and computational parallelism. This high degree of concurrency subsequently translates to better performance. The flexibility and configurability of these architectures, however, create a prohibitively large design space when mapping computations expressed in high-level programming languages to these devices. To successfully investigate the best mapping there is a need for high level program analyses and abstractions as well as automated tools. This dissertation describes a high level approach to one of these mapping problems, namely the allocation and management of storage. We develop and evaluate automatic mapping algorithms that can quickly and effectively explore alternative mapping strategies. Our objective is to minimize the overall execution time while considering the capacity and bandwidth constraints of the storage structures. Our approach combines compiler analyses with behavioral synthesis information in order to map the arrays of a loop based computation to an architecture with a set of internal memories. In particular for each computation we consider the access and reuse patterns of the data arrays, structure of the critical paths, scheduling information of the synthesis tool, as well as the storage and bandwidth constraints of the target architecture. We utilize various mapping techniques, namely data distribution, data replication, and scalar replacement. We further consider three levels of storage: off-chip memory, on-chip memory banks, and on-chip registers. We illustrate the effects of applying our analyses and mapping algorithm to a set of image/signal processing kernel codes using a Xilinx Virtex™ FPGA. The novelty of our approach lies in creating a single framework that combines various high-level compiler analyses and data transformations with lower-level scheduling information in order to map the data. Our experimental results show that our approach is very effective in finding high-quality data mappings to the storage structures of an FPGA in an automated fashion. Considering the current tendency towards increasing the variety and capacity of controllable storage structures, and given the continuing gap between computation and data access latencies, effective data management becomes an essential factor in achieving high performance in future architectures.