Static load classification for improving the value predictability of data-cache misses
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
IEEE Transactions on Computers
Hybridizing and Coalescing Load Value Predictors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Detecting global stride locality in value streams
Proceedings of the 30th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
VPC3: a fast and effective trace-compression algorithm
Proceedings of the joint international conference on Measurement and modeling of computer systems
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Generation of High-Performance Trace Compressors
Proceedings of the international symposium on Code generation and optimization
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
The VPC Trace-Compression Algorithms
IEEE Transactions on Computers
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
TCgen 2.0: a tool to automatically generate lossless trace compressors
ACM SIGARCH Computer Architecture News
Identification, Modelling and Prediction of Non-periodic Bursts in Workloads
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Improving performance through deep value profiling and specialization with code transformation
Computer Languages, Systems and Structures
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the potential of architecture-level power optimizations
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Stride: search-based deterministic replay in polynomial time via bounded linkage
Proceedings of the 34th International Conference on Software Engineering
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.01 |
Most load value predictors retain a large number of previously loaded values for making future predictions. In this paper we evaluate the trade-off between tall and slim versus short and wide predictors of the same total size, i.e., between retaining a few values for a large number of load instructions and many values for a proportionately smaller number of loads. Our results show, for example, that even modest predictors holding sixteen kilobytes of values benefit from retaining four values per load instruction when running SPECint95.A detailed comparison of eight load value predictors on a cycle-accurate simulator of a superscalar out-of-order microprocessor shows that our implementation of a last four value predictor outperforms other predictors from the literature, often significantly. With 21kB of state, it yields a harmonic mean speedup of 12.5% with existing re-fetch misprediction recovery hardware and 13.7% with a not yet realized re-execution recovery mechanism.