Cache Operations by MRU Change
IEEE Transactions on Computers
The VMP multiprocessor: initial experience, refinements, and performance evaluation
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Virtual memory primitives for user programs
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Modern operating systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Architectural support for translation table management in large address space machines
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Informed prefetching and caching
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Adaptive page replacement based on memory reference behavior
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A real-time garbage collector based on the lifetimes of objects
Communications of the ACM
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software-Managed Address Translation
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The eclipse operating system: providing quality of service via reservation domains
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Heterogeneous memory management for embedded systems
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Let caches decay: reducing leakage energy via exploitation of cache generational behavior
ACM Transactions on Computer Systems (TOCS)
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
FlexCache: A Framework for Flexible Compiler Generated Data Caching
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Data compression for improving SPM behavior
Proceedings of the 41st annual Design Automation Conference
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
BB-GC: Basic-Block Level Garbage Collection
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Cost-Effective Main Memory Organization for Future Servers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A case for multi-level main memory
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A compressed memory hierarchy using an indirect index cache
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
The V-Way Cache: Demand Based Associativity via Global Replacement
Proceedings of the 32nd annual international symposium on Computer Architecture
Dataflow analysis for energy-efficient scratch-pad memory management
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Maximizing data reuse for minimizing memory space requirements and execution cycles
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Dynamic scratch-pad memory management for irregular array access patterns
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Dynamic allocation for scratch-pad memory using compile-time decisions
ACM Transactions on Embedded Computing Systems (TECS)
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Two-level mapping based cache index selection for packet forwarding engines
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Software-based instruction caching for embedded processors
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Heap data allocation to scratch-pad memory in embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Recursive function data allocation to scratch-pad memory
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Scratch-pad memory allocation without compiler support for java applications
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient dynamic heap allocation of scratch-pad memory
Proceedings of the 7th international symposium on Memory management
Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator
Microprocessors & Microsystems
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 23rd international conference on Supercomputing
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
Divide-and-conquer: a bubble replacement for low level caches
Proceedings of the 23rd international conference on Supercomputing
Adaptive scratch pad memory management for dynamic behavior of multimedia applications
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive line placement with the set balancing cache
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing performance non-determinism via cache-aware page allocation strategies
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
A hardware/software framework for instruction and data scratchpad memory allocation
ACM Transactions on Architecture and Code Optimization (TACO)
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
ECOOP'07 Proceedings of the 2007 conference on Object-oriented technology
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Compiler-directed scratch pad memory optimization for embedded multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Base-delta-immediate compression: practical data compression for on-chip caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
TLC: a tag-less cache for reducing dynamic first level cache energy
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
As DRAM access latencies approach a thousand instruction-execution times and on-chip caches grow to multiple megabytes, it is not clear that conventional cache structures continue to be appropriate. Two key features—full associativity and software management—have been used successfully in the virtual-memory domain to cope with disk access latencies. Future systems will need to employ similar techniques to deal with DRAM latencies. This paper presents a practical, fully associative, software-managed secondary cache system that provides performance competitive with or superior to traditional caches without OS or application involvement. We see this structure as the first step toward OS- and application-aware management of large on-chip caches.This paper has two primary contributions: a practical design for a fully associative memory structure, the indirect index cache (IIC), and a novel replacement algorithm, generational replacement, that is specifically designed to work with the IIC. We analyze the behavior of an IIC with generational replacement as a drop-in, transparent substitute for a conventional secondary cache. We achieve miss rate reductions from 8% to 85% relative to a 4-way associative LRU organization, matching or beating a (practically infeasible) fully associative true LRU cache. Incorporating these miss rates into a rudimentary timing model indicates that the IIC/generational replacement cache could be competitive with a conventional cache at today's DRAM latencies, and will outperform a conventional cache as these CPU-relative latencies grow.