Fine-grained Benchmark Subsetting for System Selection

Authors:
Pablo de Oliveira Castro;Yuriy Kashnikov;Chadi Akel;Mihail Popov;William Jalby
Affiliations:
Université de Versailles Saint-Quentin-en-Yvelines, France and Exascale Computing Research, France;Exascale Computing Research, France;Exascale Computing Research, France;Exascale Computing Research, France;Université de Versailles Saint-Quentin-en-Yvelines, France and Exascale Computing Research, France
Venue:
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2014

Citing 18
Cited 0

Numerical recipes: the art of scientific computing

Numerical recipes: the art of scientific computing
The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream

Workload characterization of emerging computer applications
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences

Proceedings of the 30th annual international symposium on Computer architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics

IEEE Transactions on Computers
Performance prediction based on inherent program similarity

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite

Proceedings of the 34th annual international symposium on Computer architecture
IPython: A System for Interactive Scientific Computing

Computing in Science and Engineering
Microarchitecture-Independent Workload Characterization

IEEE Micro
XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems (TOPLAS)
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Pruning hardware evaluation space via correlation-driven application similarity analysis

Proceedings of the 8th ACM International Conference on Computing Frontiers
A code isolator: isolating code fragments from large programs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Effective source-to-source outlining to support whole program empirical optimization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Simsys: a performance simulation framework

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Is Source-Code Isolation Viable for Performance Characterization?

ICPP '13 Proceedings of the 2013 42nd International Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection. We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%.