Runnemede: An architecture for Ubiquitous High-Performance Computing

  • Authors:
  • Nicholas P. Carter;Aditya Agrawal;Shekhar Borkar;Romain Cledat;Howard David;Dave Dunning;Joshua Fryman;Ivan Ganev;Roger A. Golliver;Rob Knauerhase;Richard Lethin;Benoit Meister;Asit K. Mishra;Wilfred R. Pinfold;Justin Teller;Josep Torrellas;Nicolas Vasilache;Ganesh Venkatesh;Jianping Xu

  • Affiliations:
  • Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Reservoir Labs, New York, USA;Reservoir Labs, New York, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;University of Illinois at Urbana-Champaign, USA;Reservoir Labs, New York, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA

  • Venue:
  • HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

DARPA's Ubiquitous High-Performance Computing (UHPC) program asked researchers to develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt, assuming 2018-era fabrication technologies. This paper describes Runnemede, the research architecture developed by the Intel-led UHPC team. Runnemede is being developed through a co-design process that considers the hardware, the runtime/OS, and applications simultaneously. Near-threshold voltage operation, fine-grained power and clock management, and separate execution units for runtime and application code are used to reduce energy consumption. Memory energy is minimized through application-managed on-chip memory and direct physical addressing. A hierarchical on-chip network reduces communication energy, and a codelet-based execution model supports extreme parallelism and fine-grained tasks. We present an initial evaluation of Runnemede that shows the design process for our on-chip network, demonstrates 2–4x improvements in memory energy from explicit control of on-chip memory, and illustrates the impact of hardware-software co-design on the energy consumption of a synthetic aperture radar algorithm on our architecture.