Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Design tradeoffs for the Alpha EV8 conditional branch predictor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An evaluation of speculative instruction execution on simultaneous multithreaded processors
ACM Transactions on Computer Systems (TOCS)
A reliable return address stack: microarchitectural features to defeat stack smashing
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Speculative return address stack management revisited
ACM Transactions on Architecture and Code Optimization (TACO)
The impact of speculative execution on SMT processors
International Journal of Parallel Programming
Managing SMT resource usage through speculative instruction window weighting
ACM Transactions on Architecture and Code Optimization (TACO)
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Improving latency tolerance of network processors through simultaneous multithreading
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Hi-index | 0.00 |
Abstract: In this paper, we examine the behavior of three of the best performing branch prediction strategies proposed in the literature while executing simultaneously several threads of instructions. Our simulations show that in a multiprogramming environment, if the sizes of the tables (PHT/BTB) are proportional to the number of active threads, there are very few interactions. With parallel workloads, we could have expected a beneficial sharing effect. In fact, it is very dependent an the branch predictors and in the best case, the gains stay very limited. We also show that, for the three predictors, whether in multiprogramming or in parallel processing, if the sizes of the tables are kept small, conflicts in the BTB induce a significant increase in mispredictions. However, for parallel processing with the gshare scheme, the resulting misprediction ratios for 2 or 4 threads stay below those exhibited by 1 thread. Finally, we study the impact of the addition of one Return Address Stack per context and show that a 12-deep stack per thread is sufficient to enhance greatly the accuracy of branch prediction.