A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Authors:
Osvaldo Colavin;Davide Rizzo
Affiliations:
STMicroelectronics, Inc., San Diego, CA;STMicroelectronics, Inc., San Diego, CA
Venue:
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2003

Citing 23
Cited 3

Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
Code generator optimizations for the ST120 DSP-MCU core

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
The Garp Architecture and C Compiler

Computer
Imagine: Media Processing with Streams

IEEE Micro
Measuring the Performance of Multimedia Instruction Sets

IEEE Transactions on Computers
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Multi-Media Extensions in Super-Pipelined Micro-Architectures. A New Case for SIMD Processing?

CAMP '00 Proceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception (CAMP'00)
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Specifying and Compiling Applications for RaPiD

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Framework for Scheduling and Context Allocation in Reconfigurable Computing

Proceedings of the 12th international symposium on System synthesis
An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A survey of media processing approaches

IEEE Transactions on Circuits and Systems for Video Technology
The Equator MAP-CA™ DSP: an end-to-end broadband signal processor™ VLIW

IEEE Transactions on Circuits and Systems for Video Technology

Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Stream execution on wide-issue clustered VLIW architectures

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A coarse-grained reconfigurable architecture with compilation for high performance

International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustered VLIW architectures have been widely adopted in modern embedded multimedia applications for their ability to exploit high degrees of ILP with reasonable trade-off in complexity and silicon costs. Studies have however shown limited performance scaling for wide-issue machines. In this paper we describe the architecture of a clustered VLIW with a runtime reconfigurable inter-cluster bus suitable to address such scalability problem. The architecture is aimed at kernel loops acceleration through a coprocessor approach and allows a customization of the interconnect between neighboring register files before each loop execution. We have adopted an inter-cluster communication mechanism based on a constant-complexity interconnect. The complexity and latency independent of the number of clusters preserve the scalability on issue-width. To handle the limited connectivity, the interconnection resources in the inter-cluster bus are exposed to the compiler, and scheduled like other resources with an adapted version of modulo scheduling. Other relevant features include the capability to define shifting queues in the register files, for a more effective software pipelining support. The addition of a limited amount of reconfigurability to the well established VLIW programming model results in low-overhead inter-cluster communications and a scalable ILP architecture. Simulation results show that we can achieve near linear scalability for certain classes of kernel loops.