A Top-Down Approach to Architecting CPI Component Performance Counters

Authors:
Stijn Eyerman;Lieven Eeckhout;Tejas Karkhanis;James E. Smith
Affiliations:
Ghent University;Ghent University;Advanced Micro Devices;University of Wisconsin-Madison
Venue:
IEEE Micro
Year:
2007

Citing 8
Cited 5

Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Pentium 4 Performance-Monitoring Features

IEEE Micro
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
A performance counter architecture for computing accurate CPI components

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
The Inhibition of Potential Parallelism by Conditional Jumps

IEEE Transactions on Computers

A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Investigating the impact of code generation on performance characteristics of integer programs

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Bias scheduling in heterogeneous multi-core architectures

Proceedings of the 5th European conference on Computer systems
Pruning hardware evaluation space via correlation-driven application similarity analysis

Proceedings of the 8th ACM International Conference on Computing Frontiers
CRQ-based fair scheduling on composable multicore architectures

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software developers can gain insight into software-hardware interactions by decomposing processor performance into individual cycles-per-instruction components that differentiate cycles consumed in active computation from those spent handling various miss events. Constructing accurate CPI components for out-of-order superscalar processors is complicated, however, because computation and miss event handling overlap. The authors' counter architecture, using an analytical superscalar performance model, handles overlap effects more accurately than existing methods.