A fully associative software-managed cache design

Authors:
Erik G. Hallnor;Steven K. Reinhardt
Affiliations:
Advanced Computer Architecture Laboratory, Dept. of Electrical Engineering and Computer Science, The University of Michigan, 1301 Beal Ave., Ann Arbor, MI;Advanced Computer Architecture Laboratory, Dept. of Electrical Engineering and Computer Science, The University of Michigan, 1301 Beal Ave., Ann Arbor, MI
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 23
Cited 55

Cache Operations by MRU Change

IEEE Transactions on Computers
The VMP multiprocessor: initial experience, refinements, and performance evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Virtual memory primitives for user programs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Modern operating systems

Modern operating systems
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Adaptive page replacement based on memory reference behavior

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A real-time garbage collector based on the lifetimes of objects

Communications of the ACM
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
DASC cache

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Software-Managed Address Translation

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The eclipse operating system: providing quality of service via reservation domains

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Let caches decay: reducing leakage energy via exploitation of cache generational behavior

ACM Transactions on Computer Systems (TOCS)
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
FlexCache: A Framework for Flexible Compiler Generated Data Caching

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
User-controllable coherence for high performance shared memory multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Data compression for improving SPM behavior

Proceedings of the 41st annual Design Automation Conference
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
BB-GC: Basic-Block Level Garbage Collection

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Cost-Effective Main Memory Organization for Future Servers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A case for multi-level main memory

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A compressed memory hierarchy using an indirect index cache

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
The V-Way Cache: Demand Based Associativity via Global Replacement

Proceedings of the 32nd annual international symposium on Computer Architecture
Dataflow analysis for energy-efficient scratch-pad memory management

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Maximizing data reuse for minimizing memory space requirements and execution cycles

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Dynamic scratch-pad memory management for irregular array access patterns

Proceedings of the conference on Design, automation and test in Europe: Proceedings
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Two-level mapping based cache index selection for packet forwarding engines

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Software-based instruction caching for embedded processors

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
A cache design for high performance embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Unified microprocessor core storage

Proceedings of the 4th international conference on Computing frontiers
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Recursive function data allocation to scratch-pad memory

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Scratch-pad memory allocation without compiler support for java applications

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient dynamic heap allocation of scratch-pad memory

Proceedings of the 7th international symposium on Memory management
Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator

Microprocessors & Microsystems
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

ACM Transactions on Embedded Computing Systems (TECS)
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
Less reused filter: improving l2 cache performance via filtering less reused lines

Proceedings of the 23rd international conference on Supercomputing
Divide-and-conquer: a bubble replacement for low level caches

Proceedings of the 23rd international conference on Supercomputing
Adaptive scratch pad memory management for dynamic behavior of multimedia applications

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive line placement with the set balancing cache

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing performance non-determinism via cache-aware page allocation strategies

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
A hardware/software framework for instruction and data scratchpad memory allocation

ACM Transactions on Architecture and Code Optimization (TACO)
Customized placement for high performance embedded processor caches

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Implementation, compilation, optimization of object-oriented languages, programs and systems: report on the workshop ICOOOLPS 2007 at ECOOP 2007

ECOOP'07 Proceedings of the 2007 conference on Object-oriented technology
Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Compiler-directed scratch pad memory optimization for embedded multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
An energy-efficient adaptive hybrid cache

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Base-delta-immediate compression: practical data compression for on-chip caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
TLC: a tag-less cache for reducing dynamic first level cache energy

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

As DRAM access latencies approach a thousand instruction-execution times and on-chip caches grow to multiple megabytes, it is not clear that conventional cache structures continue to be appropriate. Two key features—full associativity and software management—have been used successfully in the virtual-memory domain to cope with disk access latencies. Future systems will need to employ similar techniques to deal with DRAM latencies. This paper presents a practical, fully associative, software-managed secondary cache system that provides performance competitive with or superior to traditional caches without OS or application involvement. We see this structure as the first step toward OS- and application-aware management of large on-chip caches.This paper has two primary contributions: a practical design for a fully associative memory structure, the indirect index cache (IIC), and a novel replacement algorithm, generational replacement, that is specifically designed to work with the IIC. We analyze the behavior of an IIC with generational replacement as a drop-in, transparent substitute for a conventional secondary cache. We achieve miss rate reductions from 8% to 85% relative to a 4-way associative LRU organization, matching or beating a (practically infeasible) fully associative true LRU cache. Incorporating these miss rates into a rudimentary timing model indicates that the IIC/generational replacement cache could be competitive with a conventional cache at today's DRAM latencies, and will outperform a conventional cache as these CPU-relative latencies grow.