Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Reducing energy and delay using efficient victim caches
Proceedings of the 2003 international symposium on Low power electronics and design
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
Location cache: a low-power L2 cache system
Proceedings of the 2004 international symposium on Low power electronics and design
A Hardware-Software Platform for Intrusion Prevention
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Reducing energy of virtual cache synonym lookup using bloom filters
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 20th annual international conference on Supercomputing
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
An energy-delay efficient 2-level data cache architecture for embedded system
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 2nd ACM Symposium on Cloud Computing
Enhancing last-level cache performance by block bypassing and early miss determination
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Efficient system-on-chip energy management with a segmented bloom filter
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SkipCache: miss-rate aware cache management
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A dual grain hit-miss detector for large die-stacked DRAM caches
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
As the performance gap between the processor cores and the memory subsystem increases, designers are forced to develop new latency hiding techniques. Arguably, the most common technique is to utilize multi-level caches. Each new generation of processors is equippedwith higher levels of memory hierarchy with increasing sizes at each level. In this paper, we propose 5 different techniques that will reduce the data access times and power consumption in processors with multi-level caches. Using the information about the blocks placed into and replaced from the caches, the techniques quickly determine whether an access at any cache level will be a miss. The accesses that are identified to miss are aborted. The structures usedto recognize misses are much smaller than the cache structures. Consequently the data access times and power consumption is reduced. Using SimpleScalar simulator, we study the performance of these techniques for a processor with 5 cache levels. The best technique is able to abort 53.1% of the misses on average in SPEC2000 applications. Using these techniques, the execution time of the applications are reduced by up to 12.4% (5.4% on average), and the power consumption of the caches is reduced by as much as 11.6% (3.8% on average).