A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
Combinatorial algorithms for integrated circuit layout
Combinatorial algorithms for integrated circuit layout
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Slack: maximizing performance under technological constraints
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic Branch Decoupled Architecture
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
A Trace Based Evaluation of Speculative Branch Decoupling
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Hi-index | 0.00 |
Conditional branch induced control hazards cause significant performance loss in modern out-of-order superscalar processors. Dynamic branch prediction techniques help alleviate the penalties associated with conditional branch instructions. However, branches still constitute one of the main hurdles towards achieving higher ILP. Dynamic branch prediction relies on the temporal locality of and spatial correlations between branches. Branch decoupling is yet another mechanism that exploits the innate lead in the branch schedule with respect to the rest of the computation. The compiler is responsible for generating the two maximally decoupled instruction streams: branch stream and program stream. Our earlier work on trace based evaluation of branch decoupling demonstrates a performance advantage of between 12% to 46% over 2-level branch prediction. However, how much of these gains are achievable through static, compiler driven decoupling is not known. This paper answers the question partially. A novel decoupling algorithm that integrates graph bi-partitioning and scheduling, was deployed in the GNU C compiler to generate a two instruction stream executable. These executables were targeted to branch decoupled architecture simulator with superscalar cores for the branch stream and program stream processors. Simulations show an average performance improvement of 7.7% and 5.5% for integer and floating point benchmarks of the SPEC2000 benchmark suite respectively.