Using Interaction Costs for Microarchitectural Bottleneck Analysis

Authors:
Brian A. Fields;Rastislav Bodík;Mark D. Hill;Chris J. Newburn
Affiliations:
University of California-Berkeley;University of California-Berkeley;University of Wisconsin-Madison;Intel Corporation
Venue:
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2003

Citing 32
Cited 13

Hierarchical performance modeling with MACS: a case study of the convex C-240

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Locality vs. criticality

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing power with dynamic critical path information

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Joint local and global hardware adaptations for energy

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Pentium 4 Performance-Monitoring Features

IEEE Micro
Quantifying Instruction Criticality

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Dynamic Prediction of Critical Path Instructions

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Non-vital Loads

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Loose Loops Sink Chips

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Interaction Cost: For When Event Counts Just Don't Add Up

IEEE Micro
RENO: A Rename-Based Instruction Optimizer

Proceedings of the 32nd annual international symposium on Computer Architecture
Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection

Proceedings of the 32nd annual international symposium on Computer Architecture
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture evaluation with floorplanning and interconnect pipelining

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Accurate critical path prediction via random trace construction

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture

Proceedings of the 5th conference on Computing frontiers
Core monitors: monitoring performance in multicore processors

Proceedings of the 6th ACM conference on Computing frontiers
End-to-end performance forecasting: finding bottlenecks before they happen

Proceedings of the 36th annual international symposium on Computer architecture
Evaluation of dynamic voltage and frequency scaling for stream programs

Proceedings of the 8th ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attacking bottlenecks in modern processors is difficultbecause many microarchitectural events overlap witheach other. This parallelism makes it difficult to both(a) assign a cost to an event (e.g., to one of two overlappingcache misses) and (b) assign blame for each cycle(e.g., for a cycle where many, overlapping resources areactive). This paper introduces a new model for understandingevent costs to facilitate processor design andoptimization.First, we observe that everything in a machine (instructions,hardware structures, events) can interact inonly one of two ways (in parallel or serially). Wequantify these interactions by defining interaction cost,which can be zero (independent, no interaction), positive(parallel), or negative (serial).Second, we illustrate the value of using interactioncosts in processor design and optimization.Finally, we propose performance-monitoring hardwarefor measuring interaction costs that is suitable formodern processors.