Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
Lightweight runtime control flow analysis for adaptive loop caching
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Embedded Systems Design
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.01 |
Dynamically-loaded tagless loop caching reduces instruction fetch power for embedded software with small loops, but only supports simple loops without taken branches. Preloaded tagless loop caching supports complex loops with branches and thus can reduce power further, but has a limit on the total number of instructions cached. We show that each does well on particular benchmarks, but neither is best across all of those benchmarks. We present a new hybrid loop cache that only preloads the complex loops, while dynamically loading other loops, thus achieving the strengths of each approach. We demonstrate better power savings than either previous approach alone.