An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving code density using compression techniques
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Enhanced code compression for embedded RISC processors
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Selective instruction compression for memory energy reduction in embedded systems
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Compiler techniques for code compaction
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Frequent value locality and value-centric data cache design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Using Slicing to Identify Duplication in Source Code
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Survey of code-size reduction methods
ACM Computing Surveys (CSUR)
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
A compressed memory hierarchy using an indirect index cache
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Improving Program Efficiency by Packing Instructions into Registers
Proceedings of the 32nd annual international symposium on Computer Architecture
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Address Correlation: Exceeding the Limits of Locality
IEEE Computer Architecture Letters
Proceedings of the conference on Design, automation and test in Europe
Multi-execution: multicore caching for data-similar executions
Proceedings of the 36th annual international symposium on Computer architecture
Hi-index | 0.00 |
Cache-content-duplication (CCD) occurs when there is a miss for a block in a cache and the entire content of the missed block is already in the cache in a block with a different tag. Caches aware of content-duplication can have lower miss penalty by fetching, on a miss to a duplicate block, directly from the cache instead of accessing lower in the memory hierarchy, and can have lower miss rates by allowing only blocks with unique content to enter a cache. This work examines the potential of CCD for instruction caches. We show that CCD is a frequent phenomenon and that an idealized duplication-detection mechanism for instruction caches has the potential to increase performance of an out-of-order processor, with a 16KB, 8-way, 8 instructions per block instruction cache, often by more than 10% and up to 36%. This work also proposes CATCH, a hardware mechanism for dynamically detecting CCD for instruction caches. Experimental results for an out-of-order processor show that a duplication-detection mechanism with a 1.38KB cost captures on average 58% of the CCD's idealized potential.