MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Memory interfacing and instruction specification for reconfigurable processors
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Configuration relocation and defragmentation for run-time reconfigurable computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Proceedings of the 41st annual Design Automation Conference
Introduction of local memory elements in instruction set extensions
Proceedings of the 41st annual Design Automation Conference
LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Task scheduling for heterogeneous reconfigurable computers
SBCCI '04 Proceedings of the 17th symposium on Integrated circuits and system design
Rapid Configuration and Instruction Selection for an ASIP: A Case Study
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
The MOLEN Polymorphic Processor
IEEE Transactions on Computers
The Molen compiler for reconfigurable processors
ACM Transactions on Embedded Computing Systems (TECS)
Run-time instruction set selection in a transmutable embedded processor
Proceedings of the 45th annual Design Automation Conference
Run-time system for an extensible embedded processor with dynamic instruction set
Proceedings of the conference on Design, automation and test in Europe
Modern Operating Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fine- and Coarse-Grain Reconfigurable Computing
Fine- and Coarse-Grain Reconfigurable Computing
Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications
Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications
Efficient task scheduling for runtime reconfigurable systems
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Reconfigurable Processors utilize a reconfigurable fabric (to implement application-specific accelerators) and may perform run-time reconfigurations to exchange the set of deployed accelerators during application execution. Depending on the application requirements, the high utilization of the reconfigurable fabric (due to run-time reconfiguration) leads to a performance improvement compared to non-reconfigurable application-specific processors (ASIPs). However, as the reconfiguration time of fine-grained reconfigurable fabrics (i.e. FPGA-like structures) is rather long (in the range of milliseconds), it is crucial to avoid frequent cycles of reconfiguration-replacement-reconfiguration of the accelerators in order to exploit the real benefits of Reconfigurable Processors. Similar to memory caches, a replacement policy has to decide which reconfigurable accelerators shall be replaced in order to reconfigure additional accelerators. In the case that a recently replaced accelerator is demanded again, the reconfiguration delay might noticeably increase the application execution time. In this paper, we demonstrate that well-known policies for cache and page replacement (typically also used in state-of-the-art Reconfigurable Processors) are not generally suitable to replace reconfigurable accelerators. We therefore propose our novel performance-guided Minimum Degradation (MinDeg) replacement policy that particularly targets Reconfigurable Processors and replaces reconfigurable accelerators at run time. It accounts for the performance penalty that occurs due to replacement of a certain accelerator. Comparisons with the most-prominent replacement policies show the superiority of our approach. We evaluate and compare MinDeg for a wide range of different reconfiguration bandwidths and reconfigurable fabric sizes and achieve a speedup of up to 2.26x (1.74x compared to the widely used LRU policy). The introduced overhead to achieve this speedup is minor in comparison to the obtained application acceleration, i.e. the highest observed overhead (to calculate our MinDeg replacement policy) affected the obtained application acceleration by only 0.30%. A parallel hardware implementation of our MinDeg algorithm demands only 4,440 gate equivalents, which corresponds to 64% of the average requirements of one real-world reconfigurable accelerator (note: multiple accelerators are demanded per kernel). However, our MinDeg policy does not rely on hardware support, i.e. a trade-off between the hardware requirements and the acceleration is possible.