FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Hi-index | 0.00 |
This paper evaluates the use of redundant binary and pipelined 2's complement adders in out-of-order execution cores. Redundant binary adders reduce the ADD latency to less than half that of traditional 2's complement adders, allowing higher core clock frequencies and greater execution bandwidth (in instructions per second). Pipelined 2's complement adders allow a higher clock frequency, but do not reduce the ADD latency. Machines with redundant binary adders are compared to machines with 2's complement adders and the same execution bandwidth and bypass network complexity. Results show that on the SPECint95 benchmarks, the average IPC of an 8-wide machine with 1-cycle redundant binary adders is 9% higher than a machine using 2-cycle pipelined adders. Pipelined functional units and multi-cycle register files may require multi-level bypass networks to guarantee that an instruction's result is available any cycle after it is produced. Multi-level bypass networks require large fan-in input muxes that increase cycle time. This paper shows that one level of bypass paths in a multi-level bypass network can be removed while still achieving within 3% to 1% of the IPC of a machine with a full bypass network.