Numerical recipes: the art of scientific computing
Numerical recipes: the art of scientific computing
The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Workload characterization of emerging computer applications
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences
Proceedings of the 30th annual international symposium on Computer architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Performance prediction based on inherent program similarity
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Proceedings of the 34th annual international symposium on Computer architecture
IPython: A System for Interactive Scientific Computing
Computing in Science and Engineering
XARK: An extensible framework for automatic recognition of computational kernels
ACM Transactions on Programming Languages and Systems (TOPLAS)
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Pruning hardware evaluation space via correlation-driven application similarity analysis
Proceedings of the 8th ACM International Conference on Computing Frontiers
A code isolator: isolating code fragments from large programs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Effective source-to-source outlining to support whole program empirical optimization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Simsys: a performance simulation framework
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Is Source-Code Isolation Viable for Performance Characterization?
ICPP '13 Proceedings of the 2013 42nd International Conference on Parallel Processing
Hi-index | 0.00 |
System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection. We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%.