Implementing Precise Interrupts in Pipelined Processors
IEEE Transactions on Computers
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Exploiting data forwarding to reduce the power budget of VLIW embedded processors
Proceedings of the conference on Design, automation and test in Europe
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Integrating superscalar processor components to implement register caching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamic management of scratch-pad memory space
Proceedings of the 38th annual Design Automation Conference
Register allocation by priority-based coloring
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Caching processor general registers
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Complexity-effective superscalar processors
Region-based register allocation for epic architectures
Region-based register allocation for epic architectures
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Computer Organization and Design
Computer Organization and Design
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Bypass aware instruction scheduling for register file power reduction
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Impact of Software Bypassing on Instruction Level Parallelism and Register File Traffic
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Hi-index | 0.00 |
This paper proposes a novel scheme to mitigate the register pressure for statically scheduled high-performance embedded processors without physically enlarging the register file. Our scheme exploits the fact that a large fraction of variables are short-lived, which do not need to be written to or read from real registers. Instead, the compiler can allocate these short-lived variables to the virtual registers, which are simply place holders (instead of physical storage locations in the register file) to identify dependences among instructions. Our experimental results demonstrate that virtual registers are very effective at reducing the number of register spills; which, in many cases, can achieve the performance close to the processor with twice number of real registers. Also, our results indicate that for some multimedia and communication applications, using a large number of virtual registers with a small number of real registers can even achieve higher performance than that of a mid-sized register file without any virtual registers.