Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Skew-tolerant circuit design
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Master-Slave Adaptive Load-Distribution Processor Model on PCA
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Dual-mode floating-point adder architectures
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
As pipeline width and depth grow to improve performance, memory arrays in microprocessors are growing in entries and ports. Arrays will increase in physical size, which prolongs the access time due to wiring delay. In order to boost clock frequency, these memory arrays must take multiple cycles to complete an access. This delays the scheduling of dependent instructions and affects overall performance. This paper proposes a different circuit organization to enable fast and slow accesses solely dependent on physical locality. Since the access time depends on a fixed physical location, it is pre-determined to scheduling dependent instructions. Furthermore, this paper presents a mechanism to re-configure the address decoding of the physical register file to increase the occurrence of fast access. Detailed circuit simulation using this proposed method determines the access cycle time. Reduction in average access cycle time for the register file and the first level data cache recovers 73% of the IPC degradation.