Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Register connection: a new approach to adding registers into instruction set architectures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Increasing the number of effective registers in a low-power processor using a windowed register file
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The IBM system/360 model 91: machine philosophy and instruction-handling
IBM Journal of Research and Development
VEGAS: soft vector processor with scratchpad memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Scalability evaluation of a polymorphic register file: A CG case study
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
An efficient non-blocking multithreaded embedded system
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Separable 2d convolution with polymorphic register files
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
Conventional register file architectures cannot optimally exploit temporal locality in data references due to their limited capacity and static encoding of register addresses in instructions. In conventional embedded architectures, the register file capacity cannot be increased without resorting to longer instruction words. Similarly, loop unrolling is often required to exploit locality in the register file accesses across iterations because naming registers statically is inflexible. Both optimizations lead to significant code size increases, which is undesirable in embedded systems. In this paper, we introduce the Register Pointer Architecture (RPA), which allows registers to be accessed indirectly through register pointers. Indirection allows a larger register file to be used without increasing the length of instruction words. Additional register file capacity allows many loads and stores, such as those introduced by spill code, to be eliminated, which improves performance and reduces energy consumption. Moreover, indirection affords additional flexibility in naming registers, which reduces the need to apply loop unrolling in order to maximize reuse of register allocated variables.