Design of the IBM Enterprise System/9000 high-end processor
IBM Journal of Research and Development
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Code transformations to improve memory parallelism
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Hi-index | 0.00 |
Pipeline Spectroscopy is a new technique that allows us to measure the cost of each cache miss. The cost of a miss is displayed (graphed) as a histogram, which represents a precise readout showing a detailed visualization of the cost of each cache miss throughout all levels of the memory hierarchy. We call the graphs 'spectrograms' because they reveal certain signature characteristics of the processor's memory hierarchy, the pipeline, and the miss pattern itself. Cache miss spectrograms are produced by analyzing misses according to the miss cluster size, and comparing instruction sequences and execution times that occurred near the miss cluster in a 'finite cache' simulation run to the same set of instructions and execution times in an 'infinite cache' run, then calculating the difference in run times. We show that in a memory hierarchy with N cache levels (L1, L2, ..., LN, and memory) and a miss cluster of size C, there are (C + N C) possible miss penalties. This represent all possible sums from all possible combinations of the miss latencies from each level of the memory hierarchy (L2, L3, ... Memory) for a given cluster size. Additionally, a theory is presented that describes the shape of a spectrogram, and we use this theory to predict the shape of spectrograms for larger miss clusters. Detailed analysis of a spectrograph leads to much greater insight in pipeline dynamics, including effects due to prefetching, and miss queueing delays.