Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Battery-aware static scheduling for distributed real-time embedded systems
Proceedings of the 38th annual Design Automation Conference
Communication architecture based power management for battery efficient system design
Proceedings of the 39th annual Design Automation Conference
Dynamic battery state aware approaches for improving battery utilization
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
System lifetime extension by battery management: an experimental work
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Integrated power management for video streaming to mobile handheld devices
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Maximizing efficiency of solar-powered systems by load matching
Proceedings of the 2004 international symposium on Low power electronics and design
A model for battery lifetime analysis for organizing applications on a pocket computer
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
System-wide energy minimization for real-time tasks: Lower bound and approximation
ACM Transactions on Embedded Computing Systems (TECS)
Power management in energy harvesting embedded systems with discrete service levels
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Hi-index | 0.00 |
As superscalar processors become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. The mechanism to fetch, align, and pass this set of instructions down the pipeline must do so as efficiently as possible. The concept of trace cache has emerged as the most promising technique to meet this high-bandwidth, low-latency fetch requirement. A new fill unit scheme, the Sliding Window Fill Mechanism, is proposed as a method to efficiently populate the trace cache. This method exploits trace continuity and identifies probable start regions to improve trace cache hit rate. Simulation yields a 7% average hit rate increase over the Rotenberg fill mechanism. When combined with branch promotion, trace cache hit rates experienced a 19% average increase along with a 17% average improvement in fetch bandwidth.