The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
Register connection: a new approach to adding registers into instruction set architectures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Power analysis of embedded software: a first step towards software power minimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low-power design
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Code Generation for Embedded Processors
Code Generation for Embedded Processors
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Allocating Lifetimes to Queues in Software Pipelined Architectures
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
A 16-bit mixed-signal microsystem with integrated CMOS-MEMS clock reference
Proceedings of the 40th annual Design Automation Conference
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Register Queues: A New Hardware/Software Approach to Efficient Software Pipelining
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Region-based register allocation for epic architectures
Region-based register allocation for epic architectures
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Variable partitioning for dual memory bank DSPs
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor
IEEE Transactions on Computers
Allocating architected registers through differential encoding
ACM Transactions on Programming Languages and Systems (TOPLAS)
Register pointer architecture for efficient embedded processors
Proceedings of the conference on Design, automation and test in Europe
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
Hi-index | 0.00 |
Low-power embedded processors utilize compact instruction encodings to achieve small code size. Instruction sizes of 8 to 16 bits are common. Such encodings place tight restrictions on the number of bits available to encode operand specifiers, and thus on the number of architected registers. The central problem with this approach is that performance and power are often sacrificed as the burden of operand supply is shifted from the register file to the memory due to the limited number of registers. In this paper, we investigate the use of a windowed register file to address this problem by providing more registers than allowed in the encoding. The registers are organized as a set of identical register windows where at each point in the execution there is a single active window. Special window management instructions are used to change the active window and to transfer values between windows. The goal of this design is to give the appearance of a large register file without compromising the instruction encoding. To support the windowed register file, we designed and implemented a novel graph partitioning based compiler algorithm that partitions virtual registers within a given procedure across multiple windows. On a 16-bit embedded processor with a parameterized register window, an average of 10% improvement in application performance and 7% reduction in system power was achieved as an eight-register design was scaled from one to four windows.