Efficient Interconnects for Clustered Microarchitectures

Authors:
Joan-Manuel Parcerisa;Julio Sahuquillo;Antonio González;José Duato
Affiliations:
-;-;-;-
Venue:
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Year:
2002

Citing 17
Cited 22

The multiscalar architecture

The multiscalar architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An empirical study of decentralized ILP execution models

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Reducing wire delay penalty through value prediction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Will Physical Scalability Sabotage Performance Gains?

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
The MAJC Architecture: A Synthesis of Parallelism and Scalability

IEEE Micro
A Cost-Effective Clustered Architecture

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Complexity-effective superscalar processors

Complexity-effective superscalar processors
Inherently lower-power high-performance superscalar architectures

Inherently lower-power high-performance superscalar architectures

Interface Design Techniques for Single-Chip Systems

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Improving dynamic cluster assignment for clustered trace cache processors

Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures

Proceedings of the 18th annual international conference on Supercomputing
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures

IEEE Transactions on Parallel and Distributed Systems
Inherently Workload-Balanced Clustered Microarchitecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Cache organizations for clustered microarchitectures

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency

IEEE Transactions on Computers
Hardware-modulated parallelism in chip multiprocessors

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A case for a complexity-effective, width-partitioned microarchitecture

ACM Transactions on Architecture and Code Optimization (TACO)
Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture

Proceedings of the 2006 international symposium on Low power electronics and design
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Trends toward on-chip networked microsystems

International Journal of High Performance Computing and Networking
Complexity Effective Bypass Networks

Transactions on High-Performance Embedded Architectures and Compilers II
Exploiting subtrace-level parallelism in clustered processors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Empowering a helper cluster through data-width aware instruction selection policies

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Single FU bypass networks for high clock rate superscalar processors

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
A constraint programming approach for integrated spatial and temporal scheduling for clustered architectures

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is an effective microarchitectural technique for reducing the impact of wire delays, the complexity, and the power requirements of microprocessors. In this work, we investigate the design of on-chip interconnection networks for clustered microarchitectures. This new class of interconnects has different demands and characteristics than traditional multiprocessor networks. In a clustered microarchitecture, a low inter-cluster communication latency is essential for high performance.We propose point-to-point interconnects together with an effective latency-aware instruction steering scheme and show that they achieve much better performance than bus-based interconnects. The results show that the connectivity of the network together with latency-aware steering schemes are key for high performance. We also show that these interconnects can be built with simple hardware and achieve a performance close to that of an idealized contention-free model.