Evaluation of bus based interconnect mechanisms in clustered VLIW architectures

Authors:
Anup Gangwar;M. Balakrishnan;Preeti Ranjan Panda;Anshul Kumar
Affiliations:
Freescale Semiconductors Pvt. Ltd., Noida, India;Department of Computer Science and Engineering, IIT Delhi, New Delhi, India;Department of Computer Science and Engineering, IIT Delhi, New Delhi, India;Department of Computer Science and Engineering, IIT Delhi, New Delhi, India
Venue:
International Journal of Parallel Programming
Year:
2007

Citing 17
Cited 1

Available paralellism in video applications

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Optimizations and oracle parallelism with dynamic translation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
High-quality operation binding for clustered VLIW datapaths

Proceedings of the 38th annual Design Automation Conference
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
One Billion Transistors, One Uniprocessor, One Chip

Computer
Using an oracle to measure potential parallelism in single instruction stream programs

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Evaluating Signal Processing and Multimedia Applications on SIMD, VLIW and Superscalar Architectures

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Parallel Media Processors for the Billion-Transistor Era

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Banked multiported register files for high-frequency superscalar microprocessors

Proceedings of the 30th annual international symposium on Computer architecture
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Billion-Transistor Architectures: There and Back Again

Computer

Computation and data transfer co-scheduling for interconnection bus minimization

Proceedings of the 2009 Asia and South Pacific Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

With new sophisticated compiler technology, it is possible to schedule distant instructions efficiently. As a consequence, the amount of exploitable instruction level parallelism (ILP) in applications has gone up considerably. However, monolithic register file VLIW architectures present scalability problems due to a centralized register file which is far slower than the functional units (FU). Clustered VLIW architectures, with a subset of FUs connected to any RF provide an attractive solution to address this issue. Recent studies with a wide variety of inter-cluster interconnection mechanisms have reported substantial gains in performance (number of cycles) over the most studied RF-to-RF type interconnections. However, these studies have compared only one or two design points in the RF-to-RF interconnects design space. In this paper, we extend the previous reported work. We consider both multicycle and pipelined buses. To obtain realistic bus latencies, we synthesized the various architectures and calculated post-layout clock periods. The results demonstrate that while there is less that 10% variation in interconnect area, the bus based architectures are slower by as much as 400%. Also, neither multicycle or pipelined buses nor increasing the number of buses itself is able to achieve performance comparable to point-to-point type interconnects.