The multiscalar architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Reducing wire delay penalty through value prediction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
A Cost-Effective Clustered Architecture
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Complexity-effective superscalar processors
Complexity-effective superscalar processors
Inherently lower-power high-performance superscalar architectures
Inherently lower-power high-performance superscalar architectures
Interface Design Techniques for Single-Chip Systems
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Improving dynamic cluster assignment for clustered trace cache processors
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
Inherently Workload-Balanced Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Cache organizations for clustered microarchitectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors
IEEE Transactions on Parallel and Distributed Systems
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency
IEEE Transactions on Computers
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture
Proceedings of the 2006 international symposium on Low power electronics and design
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers II
Exploiting subtrace-level parallelism in clustered processors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Empowering a helper cluster through data-width aware instruction selection policies
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Single FU bypass networks for high clock rate superscalar processors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Clustering is an effective microarchitectural technique for reducing the impact of wire delays, the complexity, and the power requirements of microprocessors. In this work, we investigate the design of on-chip interconnection networks for clustered microarchitectures. This new class of interconnects has different demands and characteristics than traditional multiprocessor networks. In a clustered microarchitecture, a low inter-cluster communication latency is essential for high performance.We propose point-to-point interconnects together with an effective latency-aware instruction steering scheme and show that they achieve much better performance than bus-based interconnects. The results show that the connectivity of the network together with latency-aware steering schemes are key for high performance. We also show that these interconnects can be built with simple hardware and achieve a performance close to that of an idealized contention-free model.