MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences

Authors:
Daniel Citron
Affiliations:
Haifa University Campus, Haifa, Israel
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 9
Cited 11

Characterizing computer performance with a single number

Communications of the ACM
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache performance for selected SPEC CPU2000 benchmarks

ACM SIGARCH Computer Architecture News
SPEC as a Performance Evaluation Measure

Computer
Environment for PowerPC Microarchitecture Exploration

IEEE Micro
Modeling Superscalar Processors via Statistical Simulation

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Power Issues Related to Branch Prediction

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters

Advanced non-distributed operating systems course

ACM SIGCSE Bulletin
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Measuring Benchmark Similarity Using Inherent Program Characteristics

IEEE Transactions on Computers
The bit-reversal SDRAM address mapping

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Research ethics and computer science: an unconsummated marriage

SIGDOC '06 Proceedings of the 24th annual ACM international conference on Design of communication
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite

Proceedings of the 34th annual international symposium on Computer architecture
Speed versus Accuracy Trade-Offs in Microarchitectural Simulations

IEEE Transactions on Computers
Prediction in Dynamic SDRAM Controller Policies

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Finding representative workloads for computer system design

Finding representative workloads for computer system design
SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fine-grained Benchmark Subsetting for System Selection

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

A majority of the papers published in leading computer architecture conferences use SPEC CPU2000, or its predecessor SPEC CPU95, which has become the de facto standard for measuring processor and/or memory-hierarchy performance. However, in most cases a subset of the suite's benchmarks are simulated. For example: 27 papers were published in ISCA 2002, 16 used SPEC CINT2000, 4 used the whole suite, and only 3 papers explained their omissions.This paper quantifies the extent of this phenomenon in the ISCA, Micro, and HPCA conferences: 173 papers were surveyed, 115 used benchmarks from SPEC CINT, but only 23 used the whole suite. If this current trend continues, by the year 2005 80% of the papers will use the full CINT2000 suite, a year after CPU2004 shall be announced.We claim that results based upon a subset of a benchmark suite are speculative and conflict with Amdahl's Law. The law implies that we must present the speedup of using the proposed technique on the whole suite. Projecting the law (by statistically supplying values for the missing benchmarks) to several published papers reduces promising results to average ones. Speedups are reduced from 1.42 to 1.16 in one case, from 1.43 to 1.13 in another, and from 1.76 to 1.15 in a third.Finally, we have found that the disregard for CFP2000 is unwarranted in papers that explore the data cache domain, the suite displays a higher data cache miss rate than CINT2000, which is used more frequently.