Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

Authors:
Grigorios Dimitroulakos;Nikos Kostaras;Michalis D. Galanis;Costas E. Goutis
Affiliations:
VLSI Design Laboratory, Electrical and Computer Engineering Department, University of Patras, Patras, Greece;Department of Computer Engineering and Informatics, University of Patras, Patras, Greece;VLSI Design Laboratory, Electrical and Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Laboratory, Electrical and Computer Engineering Department, University of Patras, Patras, Greece
Venue:
The Journal of Supercomputing
Year:
2009

Citing 27
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
A datapath synthesis system for the reconfigurable datapath architecture

ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration

Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Introduction to Algorithms

Introduction to Algorithms
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Baring It All to Software: Raw Machines

Computer
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture

IEEE Transactions on Computers
Register File Architecture Optimization in a Coarse-Grained Reconfigurable Architecture

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Partitioning Methodology for Heterogeneous Reconfigurable Functional Units

The Journal of Supercomputing
Speedups and energy reductions from mapping DSP applications on an embedded reconfigurable system

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Data-driven regular reconfigurable arrays: design space exploration and mapping

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation

Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coarse Grain Reconfigurable Array (CGRA) architectures have been extensively used for accelerating time consuming loops. The design of such systems requires good balance between the architecture abilities and the loops' characteristics. A reliable design is characterized by optimized cost-performance trade-off. The main target of this paper is to present an exploration framework that automates the evaluation of CGRA architectures. In specific, the framework helps the designer to identify CGRA architectures tuned toward a specific application domain. The whole process is assisted: (1) by an optimized retargetable compiler based on modulo scheduling and (2) by the Synopsys Design Compiler that provides realization metrics such as the area and clock frequency. Both target on the description of a parametric CGRA architecture template which is capable of instantiating a large diversity of these architectures. Until now, many studies suggest that clock frequency influences performance. However, none of them examines the impact of architecture on clock frequency and performance. Our work studies in a unified way for the first time the area, the clock frequency, the instructions per cycle and performance. Hence, architectures with good compromise between cost and performance can be identified. Another objective of the paper is to present the advances made to the compiler approach used by the exploration framework. In specific, a new more effective priority scheme is proposed while the modulo scheduler has been equipped with backtracking capability. The experiments outline the algorithm's efficiency and scalability for a given set of DSP benchmarks. Moreover, optimized architectures with respect to cost-performance trade-off have been identified by an exploration over 72 CGRA architecture alternatives.