Baring it all to Software: The Raw Machine

  • Authors:
  • E. Waingold;M. Taylor;V. Sarkar;V. Lee;W. Lee;J. Kim;M. Frank;P. Finch;S. Devabhaktumi;R. Barua;J. Babb;S. Amarsinghe;A. Agarwal

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-;-;-;-

  • Venue:
  • Baring it all to Software: The Raw Machine
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architectures, such as hardware scheduled superscalars, are already hitting performance and complexity limits and cannot be scaled indefinitely. The Reconfigurable Architecture Workstation (Raw) is a simple, wire-efficient architecture that scales with increasing VLSI gate densities and attempts to provide performance that is at least comparable to that provided by scaling an existing architecture, but that can achieve orders of magnitude more performance for applications in which the compiler can discover and statically schedule fine-grain parallelism. The Raw microprocessor chip comprises a set of replicated tiles, each tile containing a simple RISC like processor, a small amount of configurable logic, and a portion of memory for instructions and data. Each tile has an associated programmable switch which connects the tiles in a wide-channel point-to-point interconnect. The compiler statically schedules multiple streams of computations, with one program counter per tile. The interconnect provides register-to-register communication with very low latency and can also be statically scheduled. The compiler is thus able to schedule instruction-level parallelism across the tiles and exploit the large number of registers and memory ports. Of course, Raw provides backup dynamic support in the form of flow control for situations in which the compiler cannot determine a precise static schedule. The Raw architecture can be viewed as replacing the bus architecture of superscalar processors with a switched interconnect and accomplishing at compile time operations such as register renaming and instruction scheduling. This paper makes a case for the Raw architecture and provides early results on the plausibility of static compilation for several small benchmarks. We have implemented a prototype Raw processor (called RawLogic) and an associated compilation system by leveraging commercial FPGA based logic emulation technology. RawLogic supports many of the features of the Raw architecture and demonstrates that we can write compilers to statically orchestrate all the communication and computation in multiple threads, and that performance levels 10x-100x over workstations is achievable for many applications.