Selecting representative benchmark inputs for exploring microprocessor design spaces

Authors:
Maximilien B. Breughe;Lieven Eeckhout
Affiliations:
Ghent University, Gent, Belgium;Ghent University, Gent, Belgium
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 34
Cited 0

Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Designing Computer Architecture Research Workloads

Computer
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Workload Design: Selecting Representative Program-Input Pairs

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best

IEEE Computer Architecture Letters
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Efficient design space exploration of high performance embedded out-of-order processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Measuring Benchmark Similarity Using Inherent Program Characteristics

IEEE Transactions on Computers
A co-phase matrix to guide simultaneous multithreading simulation

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools

Proceedings of the 20th annual international conference on Supercomputing
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Automated design of application specific superscalar processors: an analytical approach

Proceedings of the 34th annual international symposium on Computer architecture
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite

Proceedings of the 34th annual international symposium on Computer architecture
The Strong correlation Between Code Signatures and Performance

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Efficiency trends and limits from comprehensive microarchitectural adaptivity

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Representative Multiprogram Workloads for Multithreaded Processor Simulation

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Evaluating iterative optimization across 1000 datasets

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
The gem5 simulator

ACM SIGARCH Computer Architecture News
How sensitive is processor customization to the workload's input datasets?

SASP '11 Proceedings of the 2011 IEEE 9th Symposium on Application Specific Processors
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A mechanistic performance model for superscalar in-order processors

ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
The Multi-Program Performance Model: Debunking current practice in multi-core simulation

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Modeling performance variation due to cache sharing

HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design process of a microprocessor requires representative workloads to steer the search process toward an optimum design point for the target application domain. However, considering a broad set of workloads to cover the large space of potential workloads is infeasible given how time-consuming design space exploration typically is. Hence, it is crucial to select a small yet representative set of workloads, which leads to a shorter design cycle while yielding a (near) optimal design. Prior work has mostly looked into selecting representative benchmarks; however, limited attention was given to the selection of benchmark inputs and how this affects workload representativeness during design space exploration. Using a set of 1,000 inputs for a number of embedded benchmarks and a design space with around 1,700 design points, we find that selecting a single or three random input(s) per benchmark potentially (in a worst-case scenario) leads to a suboptimal design that is 56% and 33% off, on average, relative to the optimal design in our design space in terms of Energy-Delay Product (EDP). We then propose and evaluate a number of methods for selecting representative inputs and show that we can find the optimum design point with as few as three inputs.