An analysis of the effects of miss clustering on the cost of a cache miss

Authors:
Thomas R. Puzak;A. Hartstein;P. G. Emma;V. Srinivasan;Jim Mitchell
Affiliations:
IBM -- T. J. Watson Research Center, Yorktown Heights, NY;IBM -- T. J. Watson Research Center, Yorktown Heights, NY;IBM -- T. J. Watson Research Center, Yorktown Heights, NY;IBM -- T. J. Watson Research Center, Yorktown Heights, NY;IBM -- T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 4th international conference on Computing frontiers
Year:
2007

Citing 15
Cited 0

ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Analytic evaluation of shared-memory systems with ILP processors

Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
When prefetching improves/degrades performance

Proceedings of the 2nd conference on Computing frontiers
Exploring the limits of prefetching

IBM Journal of Research and Development - Electrochemical technology in microelectronics
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a new technique, called pipeline spectroscopy, and use it to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs 'spectrograms' because they reveal certain signature features of the processor's memory hierarchy, the pipeline, and the miss pattern itself. Next we provide two examples that use spectroscopy to optimize the processor's hardware or application's software. The first example demonstrates how a miss spectrogram can aid software designers in analyzing the performance of an application. The second example uses a miss spectrogram to analyze bus queueing. Our experiments show that performance gains of up to 8% are possible. Detailed analysis of a spectrogram leads to much greater insight in pipeline dynamics, including effects due to miss cluster, miss overlap, prefetching, and miss queueing delays.