Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture

Authors:
Grigoris Dimitroulakos;Michalis D. Galanis;Costas E. Goutis
Affiliations:
VLSI Design Laboratory, ECE Department, University of Patras, Patras, Greece;VLSI Design Laboratory, ECE Department, University of Patras, Patras, Greece;VLSI Design Laboratory, ECE Department, University of Patras, Patras, Greece
Venue:
The Journal of Supercomputing
Year:
2007

Citing 21
Cited 3

Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A datapath synthesis system for the reconfigurable datapath architecture

ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
NuMesh: an architecture optimized for scheduled communication

The Journal of Supercomputing - Special issue on parallel and distributed processing
Advanced compiler design and implementation

Advanced compiler design and implementation
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration

Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Baring It All to Software: Raw Machines

Computer
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Automatic compilation to a coarse-grained reconfigurable system-opn-chip

ACM Transactions on Embedded Computing Systems (TECS)
Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Architecture Exploration for a Reconfigurable Architecture Template

IEEE Design & Test
Register File Architecture Optimization in a Coarse-Grained Reconfigurable Architecture

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines

High-level modelling and exploration of coarse-grained re-configurable architectures

Proceedings of the conference on Design, automation and test in Europe
A design flow for architecture exploration and implementation of partially reconfigurable processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of Inner-Loop Mapping onto Coarse-Grained Reconfigurable Architectures Using Hybrid Particle Swarm Optimization

International Journal of Organizational and Collective Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several mesh-like coarse-grained reconfigurable architectures have been devised in the last few years accompanied with their corresponding mapping flows. One of the major bottlenecks in mapping algorithms on these architectures is the limited memory access bandwidth. Only a few mapping methodologies encountered the problem of the limited bandwidth while none has explored how the performance improvements are affected, from the architectural characteristics. We study in this paper the impact that the architectural parameters have on performance speedups achieved when the PEs' local RAMs are used for storing the variables with data reuse opportunities. The data reuse values are transferred in the internal interconnection network instead of being fetched, from external memories, in order to reduce the data transfer burden on the bus network. A novel mapping algorithm is also proposed that uses a list scheduling technique. The experimental results quantified the trade-offs that exist between the performance improvements and the memory access latency, the interconnection network and the processing element's local RAM size. For this reason, our mapping methodology targets on a flexible architecture template, which permits such an exploration. More specifically, the experiments showed that the improvements increase with the memory access latency, while a richer interconnection topology can improve the operation parallelism by a factor of 1.4 on average. Finally, for the considered set of benchmarks, the operation parallelism has been improved from 8.6% to 85.1% from the application of our methodology, and by having each PE's Local RAM a size of 8 words.