Re-configurable computing in wireless
Proceedings of the 38th annual Design Automation Conference
Loop Pipelining and Optimization for Run Time Reconfiguration
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Mapping Loops onto Reconfigurable Architectures
FPL '98 Proceedings of the 8th International Workshop on Field-Programmable Logic and Applications, From FPGAs to Computing Paradigm
Re-configurable computing in wireless
Proceedings of the 38th annual Design Automation Conference
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
An algorithm for mapping loops onto coarse-grained reconfigurable architectures
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Optimizing code parallelization through a constraint network based approach
Proceedings of the 43rd annual Design Automation Conference
High-level synthesis challenges and solutions for a dynamically reconfigurable processor
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
FleXilicon architecture and its VLSI implementation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Improving performance of nested loops on reconfigurable array processors
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Reconfigurable architectures promise significant performance and flexibility advantages over conventional architectures. Automatic mapping techniques that exploit the features of the hardware are needed to leverage the power of these architectures. In this paper, we develop techniques for parallelizing nested loop computations from digital signal processing (DSP) applications onto high performance pipelined configurations. We propose a novel data context switching technique that exploits the embedded distributed memory available in reconfigurable architectures to parallelize such loops. Our technique is demonstrated on two diverse state-of-the-art reconfigurable architectures, namely, Virtex and the Chameleon Systems Reconfigurable Communications Processor. Our techniques show significant performance improvements on both architectures and also perform better than state-of-the-art DSP and microprocessor architectures.