The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
Register connection: a new approach to adding registers into instruction set architectures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Power analysis of embedded software: a first step towards software power minimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low-power design
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Linear scan register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Profile guided selection of ARM and thumb instructions
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
ARM Architecture Reference Manual
ARM Architecture Reference Manual
Code Generation for Embedded Processors
Code Generation for Embedded Processors
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Allocating Lifetimes to Queues in Software Pipelined Architectures
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
A 16-bit mixed-signal microsystem with integrated CMOS-MEMS clock reference
Proceedings of the 40th annual Design Automation Conference
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A linear-time heuristic for improving network partitions
DAC '82 Proceedings of the 19th Design Automation Conference
Register Queues: A New Hardware/Software Approach to Efficient Software Pipelining
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Region-based register allocation for epic architectures
Region-based register allocation for epic architectures
Increasing the number of effective registers in a low-power processor using a windowed register file
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Variable partitioning for dual memory bank DSPs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Efficient Use of Invisible Registers in Thumb Code
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A 16-bit, low-power microsystem with monolithic MEMS-LC clocking
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Minimal placement of bank selection instructions for partitioned memory architectures
ACM Transactions on Embedded Computing Systems (TECS)
Efficient compilation for queue size constrained queue processors
Parallel Computing
Register file partitioning and recompilation for register file power reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
ECOOP'07 Proceedings of the 2007 conference on Object-oriented technology
Implementation, compilation, optimization of object-oriented languages, programs and systems
ECOOP'06 Proceedings of the 2006 conference on Object-oriented technology: ECOOP 2006 workshop reader
Register file partitioning and compiler support for reducing embedded processor power consumption
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Journal of Supercomputing
Hi-index | 14.98 |
Low-power embedded processors utilize compact instruction encodings to achieve small code size. Such encodings place tight restrictions on the number of bits available to encode operand specifiers and, thus, on the number of architected registers. As a result, performance and power are often sacrificed as the burden of operand supply is shifted from the register file to the memory due to the limited number of registers. In this paper, we investigate the use of a windowed register file to address this problem by providing more registers than allowed in the encoding. The registers are organized as a set of identical register windows where, at each point in the execution, there is a single active window. Special window management instructions are used to change the active window and to transfer values between windows. This design gives the appearance of a large register file without compromising the instruction encoding. To support the windowed register file, we designed and implemented a graph partitioning-based compiler algorithm that partitions program variables and temporaries referenced within a procedure across multiple windows. On a 16-bit embedded processor, an average of 11 percent improvement in application performance and 25 percent reduction in system power was achieved as an 8-register design was scaled from one to two windows.