Data cache management using frequency-based replacement
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
IEEE Transactions on Computers
Filtering Techniques to Improve Trace-Cache Efficiency
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Comparative Study of Redundancy in Trace Caches (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
The Design and Evaluation of a Selective Way Based Trace Cache
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Hi-index | 0.00 |
A simple mechanism to increase the utilization of a small trace cache, and simultaneously reduce its power consumption, is presented in this article. The mechanism uses selective storage of traces (filtering) that is based on a new concept in computer architecture: random sampling. The sampling filter exploits the “hot/cold trace” principle, which divides the population of traces into two groups. The first group contains “hot traces” that are executed many times from the trace cache and contribute the majority of committed instructions. The second group contains “cold traces” that are rarely executed, but are responsible for the majority of writes to an unfiltered cache. The sampling filter selects traces without any prior knowledge of their quality. However, as most writes to the cache are of “cold traces” it statistically filters out those traces, reducing cache turnover and eventually leading to higher quality traces residing in the cache. In contrast with previously proposed filters, which perform bookkeeping for all traces in the program, the sampling filter can be implemented with minimal hardware. Results show that the sampling filter can increase the number of hits per build (utilization) by a factor of 38, reduce the miss rate by 20% and improve the performance-power efficiency by 15%. Further improvements can be obtained by extensions to the basic sampling filter: allowing “hot traces” to bypass the sampling filter, combining of sampling together with previously proposed filters, and changing the replacement policy in the trace cache. Those techniques combined with the sampling filter can reduce the miss rate of the trace cache by up to 40%. Although the effectiveness of the sampling filter is demonstrated for a trace cache, the sampling principle is applicable to other micro-architectural structures with similar access patterns.