Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting idle floating-point resources for integer execution
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic code partitioning for clustered architectures
International Journal of Parallel Programming - parallel architectures and compilation techniques, part II
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
Package level interconnect options
Proceedings of the 2005 international workshop on System level interconnect prediction
Constant Impedance Scaling Paradigm for Scaling LC transmission lines
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Hi-index | 0.00 |
In the sub-micron technology era, wire delays are becoming much more important than gate delays, making it particularly attractive to go for clustered designs. A common form of clustering adopted in processors is to replace the centralized instruction scheduler with multiple smaller schedulers that work in parallel within a single chip. Studies have found that existing interconnects connecting onchip clusters, as well as proposed instruction distribution algorithms, are not scalable. The objective of this paper is to investigate alternate interconnects (we investigate hierarchical interconnects) that provide scalable performance with increase in on-chip clusters. We also investigate distribution algorithms that are best suited for these interconnects. Experimental results of these new interconnects with appropriate distribution techniques show that they more scalable than the existing techniques. achieve an IPC that is around 15-20% more than the most scalable existing configuration, and is also within 2% of that achieved by a hypothetical ideal processor having a 1-cycle latency crossbar interconnect, irrespective of the number of clusters; confirming their utility and applicability In this paper, we also discuss the many other design advantages that are obtained by the use of hierarchical interconnects.