Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving code density using compression techniques
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
25 years of the international symposia on Computer architecture (selected papers)
Enhanced code compression for embedded RISC processors
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Compiler techniques for code compaction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Frequent value locality and value-centric data cache design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Survey of code-size reduction methods
ACM Computing Surveys (CSUR)
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Address Correlation: Exceeding the Limits of Locality
IEEE Computer Architecture Letters
Multi-execution: multicore caching for data-similar executions
Proceedings of the 36th annual international symposium on Computer architecture
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches
ACM Transactions on Architecture and Code Optimization (TACO)
Extrinsic and intrinsic text cloning
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Hi-index | 0.00 |
Cache-Content-Duplication (CCD) occurs when there is a miss for a block in a cache and the entire content of the missed block is already in the cache in a block with a different tag. Caches aware of content-duplication can have lower miss rates by allowing only blocks with unique content to enter a cache. This work examines the potential of CCD for instruction caches. We show that CCD is a frequent phenomenon and that an idealized duplication-detection mechanism for instruction caches has the potential to increase performance of an out-of-order processor, with a 2-way eight instruction per block 16KB instruction cache, often by more than 5% and up to 20%. This work also proposes CATCH, a hardware based mechanism for dynamically detecting CCD. Experimental results for an out-of-order processor show that a CATCH with a 2.32KB cost usually captures 60% or more of the CCD's idealized potential.