Runnemede: An architecture for Ubiquitous High-Performance Computing

Authors:
Nicholas P. Carter;Aditya Agrawal;Shekhar Borkar;Romain Cledat;Howard David;Dave Dunning;Joshua Fryman;Ivan Ganev;Roger A. Golliver;Rob Knauerhase;Richard Lethin;Benoit Meister;Asit K. Mishra;Wilfred R. Pinfold;Justin Teller;Josep Torrellas;Nicolas Vasilache;Ganesh Venkatesh;Jianping Xu
Affiliations:
Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Reservoir Labs, New York, USA;Reservoir Labs, New York, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA;University of Illinois at Urbana-Champaign, USA;Reservoir Labs, New York, USA;Intel Labs, Hillsboro, Oregon, USA;Intel Labs, Hillsboro, Oregon, USA
Venue:
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Year:
2013

Citing 0
Cited 1

SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

DARPA's Ubiquitous High-Performance Computing (UHPC) program asked researchers to develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt, assuming 2018-era fabrication technologies. This paper describes Runnemede, the research architecture developed by the Intel-led UHPC team. Runnemede is being developed through a co-design process that considers the hardware, the runtime/OS, and applications simultaneously. Near-threshold voltage operation, fine-grained power and clock management, and separate execution units for runtime and application code are used to reduce energy consumption. Memory energy is minimized through application-managed on-chip memory and direct physical addressing. A hierarchical on-chip network reduces communication energy, and a codelet-based execution model supports extreme parallelism and fine-grained tasks. We present an initial evaluation of Runnemede that shows the design process for our on-chip network, demonstrates 2–4x improvements in memory energy from explicit control of on-chip memory, and illustrates the impact of hardware-software co-design on the energy consumption of a synthetic aperture radar algorithm on our architecture.