ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Code transformations to improve memory parallelism
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
When prefetching improves/degrades performance
Proceedings of the 2nd conference on Computing frontiers
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Hi-index | 0.00 |
In this paper we describe a new technique, called pipeline spectroscopy, and use it to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs 'spectrograms' because they reveal certain signature features of the processor's memory hierarchy, the pipeline, and the miss pattern itself. Next we provide two examples that use spectroscopy to optimize the processor's hardware or application's software. The first example demonstrates how a miss spectrogram can aid software designers in analyzing the performance of an application. The second example uses a miss spectrogram to analyze bus queueing. Our experiments show that performance gains of up to 8% are possible. Detailed analysis of a spectrogram leads to much greater insight in pipeline dynamics, including effects due to miss cluster, miss overlap, prefetching, and miss queueing delays.