Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
A datapath synthesis system for the reconfigurable datapath architecture
ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Introduction to Algorithms
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture
IEEE Transactions on Computers
Register File Architecture Optimization in a Coarse-Grained Reconfigurable Architecture
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Partitioning Methodology for Heterogeneous Reconfigurable Functional Units
The Journal of Supercomputing
Speedups and energy reductions from mapping DSP applications on an embedded reconfigurable system
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Data-driven regular reconfigurable arrays: design space exploration and mapping
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
Hi-index | 0.00 |
Coarse Grain Reconfigurable Array (CGRA) architectures have been extensively used for accelerating time consuming loops. The design of such systems requires good balance between the architecture abilities and the loops' characteristics. A reliable design is characterized by optimized cost-performance trade-off. The main target of this paper is to present an exploration framework that automates the evaluation of CGRA architectures. In specific, the framework helps the designer to identify CGRA architectures tuned toward a specific application domain. The whole process is assisted: (1) by an optimized retargetable compiler based on modulo scheduling and (2) by the Synopsys Design Compiler that provides realization metrics such as the area and clock frequency. Both target on the description of a parametric CGRA architecture template which is capable of instantiating a large diversity of these architectures. Until now, many studies suggest that clock frequency influences performance. However, none of them examines the impact of architecture on clock frequency and performance. Our work studies in a unified way for the first time the area, the clock frequency, the instructions per cycle and performance. Hence, architectures with good compromise between cost and performance can be identified. Another objective of the paper is to present the advances made to the compiler approach used by the exploration framework. In specific, a new more effective priority scheme is proposed while the modulo scheduler has been equipped with backtracking capability. The experiments outline the algorithm's efficiency and scalability for a given set of DSP benchmarks. Moreover, optimized architectures with respect to cost-performance trade-off have been identified by an exploration over 72 CGRA architecture alternatives.