Dynamic Code Partitioning for Clustered Architectures

Authors:
Ramon Canal;Joan-Manuel Parcerisa;Antonio González
Affiliations:
Department d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Cr. Jordi Girona, 1–3 Mòdul D6, 08034 Barcelona, Spain. {rcanal, jmanel,antonio}@ac.upc.es;Department d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Cr. Jordi Girona, 1–3 Mòdul D6, 08034 Barcelona, Spain. {rcanal, jmanel,antonio}@ac.upc.es;Department d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, Cr. Jordi Girona, 1–3 Mòdul D6, 08034 Barcelona, Spain. {rcanal, jmanel,antonio}@ac.upc.es
Venue:
International Journal of Parallel Programming
Year:
2001

Citing 14
Cited 3

The multiscalar architecture

The multiscalar architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupling integer execution in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting idle floating-point resources for integer execution

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Decoupled access/execute computer architectures

ACM Transactions on Computer Systems (TOCS)
Will Physical Scalability Sabotage Performance Gains?

Computer
Distributed Modulo Scheduling

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture

Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent works^(1) show that delays introduced in the issue and bypass logic will become critical for wide issue superscalar processors. One of the proposed solutions is clustering the processor core. Clustered architectures benefit from a less complex partitioned processor core and thus, incur in less critical delays. In this paper, we propose a dynamic instruction steering logic for these clustered architectures that decides at decode time the cluster where each instruction is executed. The performance of clustered architectures depends on the inter-cluster communication overhead and the workload balance. We present a scheme that uses runtime information to optimize the trade-off between these figures. The evaluation shows that this scheme can achieve an average speed-up of 35% over a conventional 8-way issue (4 int+4 fp) machine and that it outperforms other previous proposals, either static or dynamic.