Hierarchical Means: Single Number Benchmarking with Workload Cluster Analysis

Authors:
Richard M. Yoo;Hsien-Hsin S. Lee;Han Lee;Kingsum Chow
Affiliations:
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332. yoo@ece.gatech.edu;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332. leehs@ece.gatech.edu;Managed Runtime Division, Software and Solutions Group, Intel Corp., Hillsboro, OR 97123. han.lee@intel.com;Managed Runtime Division, Software and Solutions Group, Intel Corp., Hillsboro, OR 97123. kingsum.chow@intel.com
Venue:
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Year:
2007

Citing 0
Cited 3

Wake up and smell the coffee: evaluation methodology for the 21st century

Communications of the ACM - Designing games with a purpose
SPECjvm2008 Performance Characterization

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
JVM-hosted languages: they talk the talk, but do they walk the walk?

Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools

Quantified Score

Hi-index	0.00

Visualization

Abstract

Benchmark suite scores are typically calculated by averaging the performance of each individual workload. The scores are inherently affected by the distribution of workloads. Given the applications of a benchmark suite are typically contributed by many consortium members, workload redundancy becomes inevitable. Especially, the merger of the benchmarks can significantly increase artificial redundancy. Redundancy in the workloads of a benchmark suite renders the benchmark scores biased, making the score of a suite susceptible to malicious tweaks. The current standard workaround method to alleviating the redundancy issue is to weigh each individual workload during the final score calculation. Unfortunately, such a weight-based score adjustment can significantly undermine the credibility of the objectiveness of benchmark scores. In this paper, we propose a set of benchmark suite score calculation methods called the hierarchical means that incorporate cluster analysis to amortize the negative effect of workload redundancy. These methods not only improve the accuracy and robustness of the score, but also improve the objectiveness over the weight-based approach. In addition, they can also be used to analyze the inherent redundancy and cluster characteristics in a quantitative manner for evaluating a new benchmark suite. In our case study, the hierarchical geometric mean was applied to a hypothetical Java benchmark suite, which attempts to model the upcoming release of the new SPECjvm benchmark suite. In addition, we also show that benchmark suite clustering heavily depends on how the workloads are characterized.