Folding active list for high performance and low power
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Revisiting reorder buffer architecture for next generation high performance computing
The Journal of Supercomputing
Hi-index | 0.00 |
We consider two approaches for reducing the complexity and powerdissipation in processors that use separate register file tomaintain committed register values. The first approach relies on adistributed implementation of the Reorder Buffer (ROB) that spreadsthe centralized ROB structure across the function units (FUs), witheach distributed component sized to match the FU workload and withone write port and two read ports on each component. The secondapproach combines the use of the previously proposed retentionlatches and a distributed ROB implementation that usesminimally-ported distributed components. Such a combination avoidsany read and writeport conflicts on the distributed ROB components(with the exception of possible port conflicts in the course ofcommitment) and does not incur the associated performancedegradation. Our designs are evaluated using the simulation of theSPEC 2000 benchmarks and SPICE simulations of the actual ROBlayouts in 0.18 micron process. The ROB power savings of up to 49%can be realized with only 1.7%performance loss on the average.