Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance studies of Id on the Monsoon dataflow system
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The specification of a new Manchester Dataflow machine
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Traffic analysis for on-chip networks design of multimedia applications
Proceedings of the 39th annual Design Automation Conference
IEEE Transactions on Parallel and Distributed Systems
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the conference on Design, automation and test in Europe - Volume 2
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Æthereal Network on Chip: Concepts, Architectures, and Implementations
IEEE Design & Test
Measuring the gap between FPGAs and ASICs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Computer Systems (TOCS)
RISPP: rotating instruction set processing platform
Proceedings of the 44th annual Design Automation Conference
IEEE Transactions on Computers
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
RECONNECT: A NoC for polymorphic ASICs using a low overhead single cycle router
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Streaming FFT on REDEFINE-v2: an application-architecture design space exploration
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
RETHROTTLE: execution throttling in the REDEFINE SoC architecture
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Hi-index | 0.00 |
Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization and low power dissipation, we propose REDEFINE, a polymorphic ASIC in which specialized hardware units are replaced with basic hardware units that can create the same functionality by runtime re-composition. It is a “future-proof” custom hardware solution for multiple applications and their derivatives in a domain. In this article, we describe a compiler framework and supporting hardware comprising compute, storage, and communication resources. Applications described in high-level language (e.g., C) are compiled into application substructures. For each application substructure, a set of compute elements on the hardware are interconnected during runtime to form a pattern that closely matches the communication pattern of that particular application. The advantage is that the bounded CEs are neither processor cores nor logic elements as in FPGAs. Hence, REDEFINE offers the power and performance advantage of an ASIC and the hardware reconfigurability and programmability of that of an FPGA/instruction set processor. In addition, the hardware supports custom instruction pipelining. Existing instruction-set extensible processors determine a sequence of instructions that repeatedly occur within the application to create custom instructions at design time to speed up the execution of this sequence. We extend this scheme further, where a kernel is compiled into custom instructions that bear strong producer-consumer relationship (and not limited to frequently occurring sequences of instructions). Custom instructions, realized as hardware compositions effected at runtime, allow several instances of the same to be active in parallel. A key distinguishing factor in majority of the emerging embedded applications is stream processing. To reduce the overheads of data transfer between custom instructions, direct communication paths are employed among custom instructions. In this article, we present the overview of the hardware-aware compiler framework, which determines the NoC-aware schedule of transports of the data exchanged between the custom instructions on the interconnect. The results for the FFT kernel indicate a 25% reduction in the number of loads/stores, and throughput improves by log(n) for n-point FFT when compared to sequential implementation. Overall, REDEFINE offers flexibility and a runtime reconfigurability at the expense of 1.16× in power and 8× in area when compared to an ASIC. REDEFINE implementation consumes 0.1× the power of an FPGA implementation. In addition, the configuration overhead of the FPGA implementation is 1,000× more than that of REDEFINE.