Avoiding conflict misses dynamically in large direct-mapped caches

Authors:
Brian N. Bershad;Dennis Lee;Theodore H. Romer;J. Bradley Chen
Affiliations:
Department of Computer Science, and Engineering, University of Washington, Seattle, WA;Department of Computer Science, and Engineering, University of Washington, Seattle, WA;Department of Computer Science, and Engineering, University of Washington, Seattle, WA;School of Computer Science, and Engineering Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA
Venue:
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Year:
1994

Citing 19
Cited 52

An in-cache address translation mechanism

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
MIPS RISC architecture

MIPS RISC architecture
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Inside Windows NT

Inside Windows NT
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Consistency management for virtually indexed caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Protection traps and alternatives for memory management of an object-oriented language

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Efficient software-based fault isolation

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Page allocation to reduce access time of physical caches

Page allocation to reduce access time of physical caches
Aspects of Cache Memory and Instruction

Aspects of Cache Memory and Instruction

The measured performance of personal computer operating systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
The measured performance of personal computer operating systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SHRIMP performance monitor: design and applications

SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
The case for SRAM main memory

ACM SIGARCH Computer Architecture News
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications

ACM Transactions on Computer Systems (TOCS)
Increasing TLB reach using superpages backed by shadow memory

Proceedings of the 25th annual international symposium on Computer architecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance counters and state sharing annotations: a unified approach to thread locality

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Self-paging in the Nemesis operating system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Randomized Cache Placement for Eliminating Conflicts

IEEE Transactions on Computers - Special issue on cache memory and related problems
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Hardware identification of cache conflict misses

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache-optimal methods for bit-reversals

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Power aware page allocation

ACM SIGPLAN Notices
Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Improving memory performance of sorting algorithms

Journal of Experimental Algorithmics (JEA)
Runtime identification of cache conflict misses: The adaptive miss buffer

ACM Transactions on Computer Systems (TOCS)
The Impulse Memory Controller

IEEE Transactions on Computers
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

IEEE Transactions on Computers
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC

IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Improving the Data Cache Performance of Multiprocessor Operating Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Real time aspects of cluster based caches

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Locality-Aware Process Scheduling for Embedded MPSoCs

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Cache conflict resolution through detection, analysis and dynamic remapping of active pages

ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture
Impulse: Memory system support for scientific applications

Scientific Programming
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Enhancing operating system support for multicore processors by using hardware performance monitoring

ACM SIGOPS Operating Systems Review
Cache-aware scheduling and analysis for multicores

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout for cache performance on a multithreaded architecture

Transactions on high-performance embedded architectures and compilers III
Page coloring synchronization for improving cache performance in virtualization environment

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
A hybrid hardware/software generated prefetching thread mechanism on chip multiprocessors

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Operating system support for multimedia systems

Computer Communications

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes a method for improving the performance of a large direct-mapped cache by reducing the number of conflict misses. Our solution consists of two components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that detects conflicts by recording and summarizing a history of cache misses, and a software policy within the operating system's virtual memory system that removes conflicts by dynamically remapping pages whenever large numbers of conflict misses are detected. Using trace-driven simulation of applications and the operating system, we show that a CML buffer enables a large direct-mapped cache to perform nearly as well as a two-way set associative cache of equivalent size and speed, although with lower hardware cost and complexity.