Compiler-directed code restructuring for reducing data TLB energy
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Heterogeneously tagged caches for low-power embedded systems with virtual memory support
ACM Transactions on Design Automation of Electronic Systems (TODAES)
B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Hi-index | 0.00 |
Address translation using the Translation Lookaside Buffer (TLB) consumes as much as 16% of the chip power on some processors because of its high associativity and access frequency. While prior work has looked into optimizing this structure at the circuit and architectural levels, this paper takes a different approach of optimizing its power by reducing the number of data TLB (dTLB) lookups for data references. The main idea is to keep translations in a set of translation registers, and intelligently use them in software to directly generate the physical addresses without going through the dTLB. The software has to work within the confines of the translation registers provided by the hardware, and has to maximize the reuse of such translations to be effective. We propose strategies and code transformations for achieving this in array-based and pointer-based codes, looking to optimize data accesses. Results with a suite of Spec95 array-based and pointer-based codes show dTLB energy savings of up to 73% and 88%, respectively, compared to directly using the dTLB for all references. Despite the small increase in instructions executed with our mechanisms, the approach can in fact provide performance benefits in certain cases.