Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
Improving the accuracy and performance of memory communication through renaming
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Modeling program predictability
Proceedings of the 25th annual international symposium on Computer architecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Predictive techniques for aggressive load speculation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Early load address resolution via register tracking
Proceedings of the 27th annual international symposium on Computer architecture
Hi-index | 0.00 |
Memory access latency is the traditional bottleneck of system performance. Cache is historically introduced to bridge the big gap between processor and main memory so as to reduce the load-to-use delay. The traditional one cycle cache latency has already caused problems for pipeline execution. The situation becomes worse for modern deep pipelined superscalar processor when clock rate continues to increase and cache capacity increases, which inevitably lead to more cycle cache latency.Load address prediction could alleviate the load-to-use delay by predicting the target address of load instruction in the early stage of pipeline. But existing address prediction schemes can only predict up to 67% regular address pattern.To explore the potential of address prediction, in this paper, from the program behavior, we study and simulate various load address change patterns. Our results first show that the address of load has high repeatability. We further classify load instructions naturally into several categories and analyze their behavior respectively. The reason for both correct address prediction and incorrect prediction are studied from program behavior. The load instructions with low prediction rate are further analyzed. Focused on the high misprediction for load from stack scalar variable, one new prediction schemes: stack coloring is proposed. Furthermore, we also propose a new context predictor: global context predictor, which greatly saves the prediction resources. Our results show that the high predictability with 77.5%, 75.8% and 90.6% can be achieved for stride, context and hybrid predictor respectively.