Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A datapath synthesis system for the reconfigurable datapath architecture
ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Storage Management Programmable Process
Storage Management Programmable Process
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Alleviating the Data Memory Bandwidth Bottleneck in Coarse-Grained Reconfigurable Arrays
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Compiler assisted architectural exploration for coarse grained reconfigurable arrays
Proceedings of the 17th ACM Great Lakes symposium on VLSI
A unified evaluation framework for coarse grained reconfigurable array architectures
Proceedings of the 4th international conference on Computing frontiers
Speedups and energy reductions from mapping DSP applications on an embedded reconfigurable system
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
EPIMap: using epimorphism to map applications on CGRAs
Proceedings of the 49th Annual Design Automation Conference
Performance optimization of embedded applications in a hybrid reconfigurable platform
PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs)
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
In this paper we study the performance improvements and trade-offs derived from an optimized mapping approach applied on a parametric coarse grained reconfigurable array architecture. The processing elements' local register files and the processing elements' interconnection network is exploited for caching memory data values with data reuse opportunities. The data reused values are transferred through the processing elements' interconnection network hence, relieving the bus from the burden of transferring these values. A novel mapping algorithm is also proposed that uses a modulo scheduling technique. This algorithm targets on a flexible architecture template which permits experimental exploration over different architecture alternatives. The experimental results showed that the operation parallelism was significantly improved by our mapping approach. Additionally, we have outlined the relation that exists between the performance improvements and the memory access latency, the interconnection network and the processing elements' register file size.