Modelling the hardware cost of full register bypassing in a multiple instruction issue processor
Journal of Systems Architecture: the EUROMICRO Journal - Special quintuple issue: Euromicro 1995 short contributions
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Pipelining and Bypassing in a VLIW Processor
IEEE Transactions on Parallel and Distributed Systems
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Power Reduction in VLIW Processor with Compiler Driven Bypass Network
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
A configurable multi-ported register file architecture for soft processor cores
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
MMIXware: a RISC computer for the third millennium
MMIXware: a RISC computer for the third millennium
Retargetable pipeline hazard detection for partially bypassed processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Low-power data forwarding for VLIW embedded architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exact and approximate algorithms for the extension of embedded processor instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
In this paper, a scalable scheme, configurable via register-transfer level parameters, for full register bypassing in a modern embedded processor architecture, termed ByoRISC, is presented. The register bypassing specification is parameterized regarding the number of homogeneous register file read and write ports and the number of pipeline stages of the processor. The performance characteristics (cycle time, chip area) of the proposed technique have been evaluated for FPGA target implementations of the synthesizable ByoRISC model. It is proved that, a full bypassing network is a viable solution for the elimination of data hazards when servicing instructions with multiple read and write operands. While the maximum clock frequency is reduced by 17.9% in average, when using partial versus full forwarding, the positive effect of custom computation eliminates this effect by providing cycle speedups of 3.9x to 5.5x and corresponding execution time speedups for a ByoRISC testbed processor of 3.6x. Individual application speedups of up to 9.4x have also been obtained.