Design flow for optimizing performance in processor systems with on-chip coarse-grain reconfigurable logic

Authors:
Michalis D. Galanis;Gregory Dimitroulakos;Costas E. Goutis
Affiliations:
VLSI Design Lab., Electrical and Computer Engineering Department, University of Patras, Rio, Greece;VLSI Design Lab., Electrical and Computer Engineering Department, University of Patras, Rio, Greece;VLSI Design Lab., Electrical and Computer Engineering Department, University of Patras, Rio, Greece
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 19
Cited 0

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A low power hardware/software partitioning approach for core-based embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions

FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
Math toolkit for real-time programming

Math toolkit for real-time programming
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
Specifying and Compiling Applications for RaPiD

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
PACT XPP—A Self-Reconfigurable Data Processing Architecture

The Journal of Supercomputing
Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Compiler-Directed ILP Extraction for Clustered VLIW/EPIC Machines: Predication, Speculation and Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Scalable Processor Instruction Set Extension

IEEE Design & Test
Register File Architecture Optimization in a Coarse-Grained Reconfigurable Architecture

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
SpecSyn: an environment supporting the specify-explore-refine paradigm for hardware/software system design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A design flow for processor platforms with on-chip coarse-grain reconfigurable logic is presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. Performance is improved by accelerating critical software loops, called kernels, on the Reconfigurable Array. Basic steps of the design flow have been automated. A procedure for detecting critical loops in the input C code was developed, while a mapping technique for Coarse Grain Reconfigurable Arrays, based on software pipelining, was also devised. Analytical results derived from mapping five real-life DSP applications on eight different instances of a generic system architecture are presented. Large values of Instructions Per Cycle were achieved on two Reconfigurable Arrays that resulted in high-performance kernel mapping. Additionally, by mapping critical code on the reconfigurable logic, speedups ranging from 1.27 to 3.18 relative to an all-processor execution were achieved.