Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

  • Authors:
  • Michael Bedford Taylor;Walter Lee;Jason Miller;David Wentzlaff;Ian Bratt;Ben Greenwald;Henry Hoffmann;Paul Johnson;Jason Kim;James Psota;Arvind Saraf;Nathan Shnidman;Volker Strumpen;Matt Frank;Saman Amarasinghe;Anant Agarwal

  • Affiliations:
  • CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology;CSAIL, Massachusetts Institute of Technology

  • Venue:
  • Proceedings of the 31st annual international symposium on Computer architecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running existingILP-based sequential programs with reasonable performance in theface of increasing wire delays. Raw approaches this challenge byimplementing plenty of on-chip resources - including logic, wires,and pins - in a tiled arrangement, and exposing them through a newISA, so that the software can take advantage of these resources forparallel applications. Raw supports both ILP and streams by routingoperands between architecturally-exposed functional units overa point-to-point scalar operand network. This network offers lowlatency for scalar data transport. Raw manages the effect of wiredelays by exposing the interconnect and using software to orchestrateboth scalar and stream data transport.We have implemented a prototype Raw microprocessor in IBM's180 nm, 6-layer copper, CMOS 7SF standard-cell ASIC process. Wehave also implemented ILP and stream compilers. Our evaluationattempts to determine the extent to which Raw succeeds in meetingits goal of serving as a more versatile, general-purpose processor.Central to achieving this goal is Raw's ability to exploit all formsof parallelism, including ILP, DLP, TLP, and Stream parallelism.Specifically, we evaluate the performance of Raw on a diverse setof codes including traditional sequential programs, streaming applications,server workloads and bit-level embedded computation.Our experimental methodology makes use of a cycle-accurate simulatorvalidated against our real hardware. Compared to a 180 nmPentium-III, using commodity PC memory system components, Rawperforms within a factor of 2x for sequential applications with a verylow degree of ILP, about 2x to 9x better for higher levels of ILP, and10x-100x better when highly parallel applications are coded in astream language or optimized by hand. The paper also proposes anew versatility metric and uses it to discuss the generality of Raw.