Architecture for transparent binary acceleration of loops with memory accesses

Authors:
Nuno Paulino;João Canas Ferreira;João M. P. Cardoso
Affiliations:
INESC TEC and Faculty of Engineering, University of Porto, Portugal;INESC TEC and Faculty of Engineering, University of Porto, Portugal;INESC TEC and Department of Informatics Engineering, Faculty of Engineering, University of Porto, Portugal
Venue:
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Year:
2013

Citing 8
Cited 0

Hacker's Delight

Hacker's Delight
A Decade of Hardware/Software Codesign

Computer
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
An architecture framework for an adaptive extensible processor

The Journal of Supercomputing
Transparent reconfigurable acceleration for heterogeneous embedded applications

Proceedings of the conference on Design, automation and test in Europe
Design and implementation of a MicroBlaze-based warp processor

ACM Transactions on Embedded Computing Systems (TECS)
Binary acceleration using coarse-grained reconfigurable architecture

ACM SIGARCH Computer Architecture News
Memory access optimization in compilation for coarse-grained reconfigurable architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP's data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43× is achieved, and a potential speedup of 2.09× is predicted for an implementation using a low overhead interface for communication between GPP and RPU.