Merrimac: Supercomputing with Streams

  • Authors:
  • William J. Dally;Francois Labonte;Abhishek Das;Patrick Hanrahan;Jung-Ho Ahn;Jayanth Gummaraju;Mattan Erez;Nuwan Jayasena;Ian Buck;Timothy J. Knight;Ujval J. Kapasi

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-;-

  • Venue:
  • Proceedings of the 2003 ACM/IEEE conference on Supercomputing
  • Year:
  • 2003

Quantified Score

Hi-index 0.02

Visualization

Abstract

Merrimac uses stream architecture and advanced interconnection networks to give an order of magnitude more performance per unit cost than cluster-based scientific computers built from the same technology. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative applications by an order of magnitude or more. Hence a processing node with a fixed bandwidth (expensive) can support an order of magnitude more arithmetic units (inexpensive). This in turn allows a given level of performance to be achieved with fewer nodes (a 1-PFLOPS machine, for example, with just 8,192 nodes) resulting in greater reliability, and simpler system management. We sketch the design of Merrimac, a streaming scientific computer that can be scaled from a $20K 2 TFLOPS workstation to a $20M 2 PFLOPS supercomputer and present the results of some initial application experiments on this architecture.