ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Dynamic loop pipelining in data-driven architectures
Proceedings of the 2nd conference on Computing frontiers
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
Defect tolerance at the end of the roadmap
Nano, quantum and molecular computing
Loop pipelining for high-throughput stream computation using self-timed rings
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Accelerating Speculative Execution in High-Level Synthesis with Cancel Tokens
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Configuration Sharing to Reduce Reconfiguration Overhead Using Static Partial Reconfiguration
IEICE - Transactions on Information and Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
The implementation of a coarse-grained reconfigurable architecture with loop self-pipelining
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multi-token resource sharing for pipelined asynchronous systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
This thesis presents a compilation framework for translating ANSI C programs into hardware dataflow machines. The framework is embodied in the CASH compiler, a Compiler for Application-Specific Hardware. CASH generates asynchronous hardware circuits that directly implement the functionality of the source program, without using any interpretative structures. This style of computation is dubbed “Spatial Computation.” CASH relies extensively on predication and speculation for building efficient hardware circuits. The first part of this document describes Pegasus, the internal representation of CASH, and a series of novel program transformations performed by CASH. The most notable of these are a new optimal register-promotion algorithm and partial redundancy elimination for memory accesses based on predicate manipulation. The second part of this document evaluates the performance of the generated circuits using simulation. Using media processing benchmarks, we show that for the domain of embedded computation, the circuits generated by CASH can sustain high levels of instruction level parallelism, due to the effective use of dataflow software pipelining. A comparison of Spatial Computation and superscalar processors highlights some of the weaknesses of our model of computation, such as the lack of branch prediction and register renaming. Low-level simulation however suggests that the energy efficiency of Application-Specific Hardware is three orders of magnitude better than superscalar processors, one order of magnitude better than low-power digital signal processors and asynchronous processors, and approaching custom hardware chips. The results presented in this document can be applied in several domains: (1) most of the compiler optimizations are applicable to traditional compilers for high-level languages; (2) CASH itself can be used as a hardware synthesis tool for very fast system-on-a-chip prototyping directly from C sources; (3) the compilation framework we describe can be applied to the translation of imperative languages to dataflow machines; (4) we have extended the dataflow machine model to encompass predication, data-speculation and control-speculation; and (5) the tool-chain described and some specific optimizations, such as lenient execution and pipeline balancing, can be used for synthesis and optimization of asynchronous hardware.