Machine organization of the IBM RISC System/6000 processor
IBM Journal of Research and Development
Dynamic base register caching: a technique for reducing address bus width
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IEEE Transactions on Computers
Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Accurate indirect branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
The YAGS branch prediction scheme
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Swap compression: resurrecting old ideas
Software—Practice & Experience
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
Frequent value compression in data caches
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An on-chip cache compression technique to reduce decompression overhead and design complexity
Journal of Systems Architecture: the EUROMICRO Journal
Effective algorithms for cache-level compression
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Frequent value locality and value-centric data cache design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Frequent value locality and its applications
ACM Transactions on Embedded Computing Systems (TECS)
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Parallel compression with cooperative dictionary construction
DCC '96 Proceedings of the Conference on Data Compression
Creating a wider bus using caching techniques
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and Evaluation of a Selective Compressed Memory System
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Hardware-Assisted Data Compression for Energy Minimization in Systems with Embedded Processors
Proceedings of the conference on Design, automation and test in Europe
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Effects of Mispredicted-Path Execution on Branch Prediction Structures
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
The case for compressed caching in virtual memory systems
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
IBM memory expansion technology (MXT)
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
A compressed memory hierarchy using an indirect index cache
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
An analytical model for software-only main memory compression
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A Robust Main-Memory Compression Scheme
Proceedings of the 32nd annual international symposium on Computer Architecture
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory State Compressors for Giga-Scale Checkpoint/Restore
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Fully associative cache partitioning with don't care bits for real-time applications
ACM SIGBED Review - Special issue: IEEE RTAS 2005 work-in-progress
Adaptive main memory compression
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering
Proceedings of the 21st annual international conference on Supercomputing
Improving disk bandwidth-bound applications through main memory compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Scalable packet classification using interpreting: a cross-platform multi-core solution
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Leakage energy reduction in cache memory by data compression
ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Adaptive data compression for high-performance low-power on-chip networks
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 23rd international conference on Supercomputing
Cancellation of loads that return zero using zero-value caches
Proceedings of the 23rd international conference on Supercomputing
Multi-execution: multicore caching for data-similar executions
Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Online cache state dumping for processor debug
Proceedings of the 46th Annual Design Automation Conference
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Brief announcement: flashcrowding in tiled multiprocessors under thermal constraints
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Efficient lookahead routing and header compression for multicasting in networks-on-chip
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Cache aware compression for processor debug support
Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Decoupled zero-compressed memory
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
C-pack: a high-performance microprocessor cache compression algorithm
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory-, bandwidth-, and power-aware multi-core for a graph database workload
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic dictionary-based data compression for level-1 caches
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Dynamic co-allocation of level one caches
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
HICAMP: architectural support for efficient concurrency-safe shared structured data access
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A dual-phase compression mechanism for hybrid DRAM/PCM main memory architectures
Proceedings of the great lakes symposium on VLSI
Proceedings of the 26th ACM international conference on Supercomputing
ER: elastic RESET for low power and long endurance MLC based phase change memory
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Base-delta-immediate compression: practical data compression for on-chip caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearly compressed pages: a main memory compression framework with low complexity and low latency
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Linearly compressed pages: a low-complexity, low-latency main memory compression framework
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Unleashing the potential of MLC STT-RAM caches
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Modern processors use two or more levels ofcache memories to bridge the rising disparity betweenprocessor and memory speeds. Compression canimprove cache performance by increasing effectivecache capacity and eliminating misses. However,decompressing cache lines also increases cache accesslatency, potentially degrading performance.In this paper, we develop an adaptive policy thatdynamically adapts to the costs and benefits of cachecompression. We propose a two-level cache hierarchywhere the L1 cache holds uncompressed data and the L2cache dynamically selects between compressed anduncompressed storage. The L2 cache is 8-way set-associativewith LRU replacement, where each set can storeup to eight compressed lines but has space for only fouruncompressed lines. On each L2 reference, the LRUstack depth and compressed size determine whethercompression (could have) eliminated a miss or incurs anunnecessary decompression overhead. Based on thisoutcome, the adaptive policy updates a single globalsaturating counter, which predicts whether to allocatelines in compressed or uncompressed form.We evaluate adaptive cache compression usingfull-system simulation and a range of benchmarks. Weshow that compression can improve performance formemory-intensive commercial workloads by up to 17%.However, always using compression hurts performancefor low-miss-rate benchmarks-due to unnecessarydecompression overhead-degrading performance byup to 18%. By dynamically monitoring workload behavior,the adaptive policy achieves comparable benefitsfrom compression, while never degrading performanceby more than 0.4%.