Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Partitioned register file for TTAs
Proceedings of the 28th annual international symposium on Microarchitecture
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
An Architectural Overview of the Programmable Multimedia Processor, TM-1
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Treegion Scheduling for Wide Issue Processors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Cluster assignment of global values for clustered VLIW processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A scalable wide-issue clustered VLIW with a reconfigurable interconnect
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 1st conference on Computing frontiers
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
Proceedings of the 1st conference on Computing frontiers
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A unified processor architecture for RISC & VLIW DSP
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Enabling compiler flow for embedded VLIW DSP processors with distributed register files
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Journal of Signal Processing Systems
Journal of Signal Processing Systems
Energy-aware register file re-partitioning for clustered VLIW architectures
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures
International Journal of Parallel Programming
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
Optimizing scheduling and intercluster connection for application-specific DSP processors
IEEE Transactions on Signal Processing
An efficient heuristic for instruction scheduling on clustered vliw processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Clustering is a well-known technique to improve the implementation of single register file VLIW processors. Many previous studies in clustering adhere to an inter-cluster communication means in the form of copy operations. This paper, however, identifies and evaluates fivedifferent inter-cluster communication models, including copy operations, dedicated issue slots, extended operands, extended results, and broadcasting. Our study reveals that these models have a major impact on performance and implementation of the clustered VLIW. We found that copy operations executed in regular VLIW issue slots significantly constrain the scheduling freedom of regular operations. For example, in the dense code for our four clustermachine the total cycle count overhead reached 46.8% with respect to the unicluster architecture, 56% of which are caused by the copy operation constraint. Therefore, we propose to use other models (e.g. extended results or broadcasting), which deliver higher performance than the copy operation model at the same hardware cost.