Hierarchical Means: Single Number Benchmarking with Workload Cluster Analysis

  • Authors:
  • Richard M. Yoo;Hsien-Hsin S. Lee;Han Lee;Kingsum Chow

  • Affiliations:
  • School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332. yoo@ece.gatech.edu;School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332. leehs@ece.gatech.edu;Managed Runtime Division, Software and Solutions Group, Intel Corp., Hillsboro, OR 97123. han.lee@intel.com;Managed Runtime Division, Software and Solutions Group, Intel Corp., Hillsboro, OR 97123. kingsum.chow@intel.com

  • Venue:
  • IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Benchmark suite scores are typically calculated by averaging the performance of each individual workload. The scores are inherently affected by the distribution of workloads. Given the applications of a benchmark suite are typically contributed by many consortium members, workload redundancy becomes inevitable. Especially, the merger of the benchmarks can significantly increase artificial redundancy. Redundancy in the workloads of a benchmark suite renders the benchmark scores biased, making the score of a suite susceptible to malicious tweaks. The current standard workaround method to alleviating the redundancy issue is to weigh each individual workload during the final score calculation. Unfortunately, such a weight-based score adjustment can significantly undermine the credibility of the objectiveness of benchmark scores. In this paper, we propose a set of benchmark suite score calculation methods called the hierarchical means that incorporate cluster analysis to amortize the negative effect of workload redundancy. These methods not only improve the accuracy and robustness of the score, but also improve the objectiveness over the weight-based approach. In addition, they can also be used to analyze the inherent redundancy and cluster characteristics in a quantitative manner for evaluating a new benchmark suite. In our case study, the hierarchical geometric mean was applied to a hypothetical Java benchmark suite, which attempts to model the upcoming release of the new SPECjvm benchmark suite. In addition, we also show that benchmark suite clustering heavily depends on how the workloads are characterized.