A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Dependence flow graphs: an algebraic approach to program dependencies
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The named-state register file
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
The Alpha 21264 Microprocessor
IEEE Micro
The TigerSHARC DSP Architecture
IEEE Micro
Caching processor general registers
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Building Multithreaded Architectures with Off-the-Shelf Microprocessors
Proceedings of the 8th International Symposium on Parallel Processing
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
An Eight Issue Tree-VLIW Processor for Dynamic Binary Translation
ICCD '98 Proceedings of the International Conference on Computer Design
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Microarchitecture and compilation support for clustered instruction-level parallel processors
Microarchitecture and compilation support for clustered instruction-level parallel processors
Inter-Cluster Communication Models for Clustered VLIW Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cluster assignment of global values for clustered VLIW processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
In Clustered Instruction-level Parallel (ILP) processors, the function units are partitioned and resources such as register file and cache are either partitioned or replicated and then grouped together into on-chip clusters. We present a novel partitioned register file architecture for clustered ILP processors which exploits the temporal locality of references to remote registers in a cluster and combines multiple inter-cluster communication operations into a single broadcast operation using a new sendb instruction. Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers. We present an efficient code generation algorithm to schedule sendb operations on-the-fly. Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping remote registers.