A common machine language for grid-based architectures
ACM SIGARCH Computer Architecture News
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Language and compiler design for streaming applications
International Journal of Parallel Programming - Special issue: The next generation software program
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Orchestrating stream graphs using model checking
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled superscalars, are already hitting performance and complexity limits and cannot be scaled indefinitely. The Reconfigurable Architecture Workstation (Raw) is a simple, wire-efficient architecture that scales with increasing VLSI gate densities and attempts to provide performance that is at least comparable to that provided by scaling an existing architecture, but that can achieve orders of magnitude more performance for applications in which the compiler can discover and statically schedule fine-grain parallelism. The Raw microprocessor chip comprises a set of replicated tiles, each tile containing a simple RISC like processor, a small amount of configurable logic, and a portion of memory for instructions and data. Each tile has an associated programmable switch which connects the tiles in a wide-channel point-to-point interconnect. The compiler statically schedules multiple streams of computations, with one program counter per tile. The interconnect provides register-to-register communication with very low latency and can also be statically scheduled. The compiler is thus able to schedule instruction-level parallelism across the tiles and exploit the large number of registers and memory ports. Of course, Raw provides backup dynamic support in the form of flow control for situations in which the compiler cannot determine a precise static schedule. The Raw architecture can be viewed as replacing the bus architecture of superscalar processors with a switched interconnect and accomplishing at compile time operations such as register renaming and instruction scheduling. This paper makes a case for the Raw architecture and provides early results on the plausibility of static compilation for several small benchmarks. We have implemented a prototype Raw processor (called RawLogic) and an associated compilation system by leveraging commercial FPGA based logic emulation technology. RawLogic supports many of the features of the Raw architecture and demonstrates that we can write compilers to statically orchestrate all the communication and computation in multiple threads, and that performance levels 10x-100x over workstations is achievable for many applications.