Improvements to graph coloring register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Memory access coalescing: a technique for eliminating redundant memory accesses
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Quality and speed in linear-scan register allocation
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Linear scan register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Bidwidth analysis with application to silicon compilation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Bit section instruction set extension of ARM for embedded applications
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Bitwidth aware global register allocation
POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A Representation for Bit Section Based Analysis and Optimization
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Compiler Analysis of the Value Ranges for Variables
IEEE Transactions on Software Engineering
Speculative subword register allocation in embedded processors
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Subregion analysis and bounds check elimination for high level arrays
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Improved bitwidth-aware variable packing
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Embedded processors depend on register files for performance, just like general-purpose processors in desktop and server systems. However, unlike general-purpose processors, the power consumption of register files poses a significant challenge for embedded processors, making it desirable for embedded processors to use as few registers as possible. Past research has indicated the potential for leveraging bitwidth analysis and bitwidth-aware register allocation to reduce register usage in embedded applications. This paper makes the following contributions in evaluating and enhancing bitwidth-aware register allocation for embedded applications. First, we compare the Tallam-Gupta bitwidth analysis with an idealized limit study, and show significant opportunities for enhancements. Second, we show how bitwidth-aware register allocation can be enhanced by enhanced bitwidth analysis for scalar and array variables, and also by enhanced coalescing of variables. Third, we use our prototype implementation of bitwidth-aware register allocation in gcc to compare the number of dynamic spill load/store instructions resulting from a) bitwidth-unaware allocation, b) bitwidth-aware allocation, c) enhanced bitwidth-aware allocation, and d) ideal profile-driven bitwidth-aware allocation. Our results show that our enhancements can reduce the number of dynamic spill load/store instructions to between 3% and 27% of the number obtained from the Tallam-Gupta algorithm.