A design methodology for domain-optimized power-efficient supercomputing

Authors:
Marghoob Mohiyuddin;Mark Murphy;Leonid Oliker;John Shalf;John Wawrzynek;Samuel Williams
Affiliations:
University of California at Berkeley, Berkeley, CA and Lawrence Berkeley National Laboratory, Berkeley, CA;University of California at Berkeley, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;University of California at Berkeley, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 14
Cited 3

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatic performance tuning of sparse matrix kernels

Automatic performance tuning of sparse matrix kernels
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

International Journal of High Performance Computing Applications
Design space exploration for multicore architectures: a power/performance/thermal view

Proceedings of the 20th annual international conference on Supercomputing
Building ASIPs: The Mescal Methodology

Building ASIPs: The Mescal Methodology
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
RAMP: Research Accelerator for Multiple Processors

IEEE Micro
A compiler-in-the-loop framework to explore horizontally partitioned cache architectures

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Simulation and embedded software development for Anton, a parallel machine with heterogeneous multicore ASICs

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis

Supervised learning with minimal effort

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Toward codesign in high performance computing systems

Proceedings of the International Conference on Computer-Aided Design
Inferred Models for Dynamic and Sparse Hardware-Software Spaces

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

As power has become the pre-eminent design constraint for future HPC systems, computational efficiency is being emphasized over simply peak performance. Recently, static benchmark codes have been used to find a power efficient architecture. Unfortunately, because compilers generate sub-optimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software cotuning as a novel approach for system design, in which traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency. We demonstrate the proposed methodology by exploring the parameter space of a Tensilica-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Results demonstrate that co-tuning significantly improves hardware area and energy efficiency -- a key driver for next generation of HPC system design.