REDEFINE: Runtime reconfigurable polymorphic ASIC

Authors:
Mythri Alle;Keshavan Varadarajan;Alexander Fell;Ramesh Reddy C.;Nimmy Joseph;Saptarsi Das;Prasenjit Biswas;Jugantor Chetia;Adarsh Rao;S. K. Nandy;Ranjani Narayan
Affiliations:
CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;CAD Lab, SERC, Indian Institute of Science, Bangalore;Morphing Machines, Bangalore, India
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2009

Citing 21
Cited 4

Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance studies of Id on the Monsoon dataflow system

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The specification of a new Manchester Dataflow machine

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Traffic analysis for on-chip networks design of multimedia applications

Proceedings of the 39th annual Design Automation Conference
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Guaranteed Bandwidth Using Looped Containers in Temporally Disjoint Networks within the Nostrum Network on Chip

Proceedings of the conference on Design, automation and test in Europe - Volume 2
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Æthereal Network on Chip: Concepts, Architectures, and Implementations

IEEE Design & Test
Measuring the gap between FPGAs and ASICs

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Reducing control overhead in dataflow architectures

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
The WaveScalar architecture

ACM Transactions on Computer Systems (TOCS)
RISPP: rotating instruction set processing platform

Proceedings of the 44th annual Design Automation Conference
Binary Decision Diagrams

IEEE Transactions on Computers
Polymorphic On-Chip Networks

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
RECONNECT: A NoC for polymorphic ASICs using a low overhead single cycle router

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors

Streaming FFT on REDEFINE-v2: an application-architecture design space exploration

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
RETHROTTLE: execution throttling in the REDEFINE SoC architecture

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Towards minimizing execution delays on dynamically reconfigurable processors: a case study on REDEFINE

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Dataflow graph partitioning for optimal spatio-temporal computation on a coarse grain reconfigurable architecture

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a “future-proof” custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16× in power and 8× in area when compared to an ASIC. REDEFINE implementation consumes 0.1× the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000× more than that of REDEFINE.