MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Automatic detection of recurring operation patterns
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
TTAs: missing the ILP complexity wall
Journal of Systems Architecture: the EUROMICRO Journal - Special double issue on microprocessor architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modeling technology impact on cluster microprocessor performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Profile-guided microarchitectural floorplanning for deep submicron processor design
Proceedings of the 41st annual Design Automation Conference
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Empirical Analysis of Operand Usage and Transport in Multimedia Applications
IWSOC '04 Proceedings of the System-on-Chip for Real-Time Applications, 4th IEEE International Workshop
Hi-index | 0.00 |
As semiconductor feature sizes decrease, interconnect delay is becoming a dominant component of processor cycle times. This creates a critical need to shift micro-architectural design focus from operation computation to operand transport. Operand bypass networks of out-of-order superscalar processors are particularly demanding of wiring resources. Forwarding path delay has become a limiting factor of processor performance. This paper proposes a novel technology-based methodology to evaluate bypass network configurations by predicting operand transport cost. It combines technology modeling techniques with cycle-accurate simulation of benchmark applications to characterize operand movement and storage requirements. Our analysis shows that the operand transport cost heavily depends on the physical location of functional units (FUs) and instruction steering strategy. We propose a traffic-based placement which places FUs based on the transport distribution pattern; and a geometry-driven instruction steering which tries to assign each pair of dependent instructionsto adjacent computing resources. Performance is evaluated on an aggressive eight-way, 16 functional unit processor operating at 1.9 GHz in 100 nm technology. Combining these two techniques, the IPC penalties resulting from wire delay latency can be kept within 6.8% of the ideal zero bypass delay processor for Spec2000Int and within 5.5% for MediaBench.