Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic path-based branch correlation
Proceedings of the 28th annual international symposium on Microarchitecture
Control flow prediction with tree-like subgraphs for superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Control Flow Speculation in Multiscalar Processors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Load execution latency reduction
ICS '98 Proceedings of the 12th international conference on Supercomputing
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Variable length path branch prediction
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Proceedings of the 27th annual international symposium on Computer architecture
Completion time multiple branch prediction for enhancing trace cache performance
Proceedings of the 27th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications
IEEE Transactions on Computers
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures
International Journal of Parallel Programming
On Augmenting Trace Cache for High-Bandwidth Value Prediction
IEEE Transactions on Computers
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Proceedings of the 30th annual international symposium on Computer architecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
A low-complexity fetch architecture for high-performance superscalar processors
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Programmable Hardware Path Profiler
Proceedings of the international symposium on Code generation and optimization
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Incremental Commit Groups for Non-Atomic Trace Processing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
Branch predictor guided instruction decoding
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Evaluating trace cache energy efficiency
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Computer Systems (TOCS)
Unified control flow and data dependence traces
ACM Transactions on Architecture and Code Optimization (TACO)
The potential of trace-level parallelism in Java programs
Proceedings of the 5th international symposium on Principles and practice of programming in Java
IEEE Transactions on Computers
A latency-conscious SMT branch prediction architecture
International Journal of High Performance Computing and Networking
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
A study of potential parallelism among traces in Java programs
Science of Computer Programming
Creating artificial global history to improve branch prediction accuracy
Proceedings of the 23rd international conference on Supercomputing
International Journal of Modelling and Simulation
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
The significance of affectors and affectees correlations for branch prediction
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Improving branch prediction by considering affectors and affectees correlations
Transactions on high-performance embedded architectures and compilers III
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
An offline approach for whole-program paths analysis using suffix arrays
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Target encoding for efficient indirect jump prediction
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Improving instruction delivery with a block-aware ISA
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Exploiting intra-function correlation with the global history stack
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Limits of region-based dynamic binary parallelization
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Cachetor: detecting cacheable data to remove bloat
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
The trace cache has been proposed as a mechanism for providing increased fetch bandwidth by allowing the processor to fetch across multiple branches in a single cycle. But to date predicting multiple branches per cycle has meant paying a penalty in prediction accuracy. We propose a next trace predictor that treats the traces as basic units and explicitly predicts sequences of traces. The predictor collects histories of trace sequences (paths) and makes predictions based on these histories. The basic predictor is enhanced to a hybrid configuration that reduces performance losses due to cold starts and aliasing in the prediction table. The Return History Stack is introduced to increase predictor performance by saving path history information across procedure call/returns. Overall, the predictor yields about a 26% reduction in misprediction rates when compared with the most aggressive previously proposed, multiple-branch-prediction methods.