An in-cache address translation mechanism
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
MIPS RISC architecture
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Inside Windows NT
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Consistency management for virtually indexed caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Protection traps and alternatives for memory management of an object-oriented language
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Efficient software-based fault isolation
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Page allocation to reduce access time of physical caches
Page allocation to reduce access time of physical caches
Aspects of Cache Memory and Instruction
Aspects of Cache Memory and Instruction
The measured performance of personal computer operating systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A system level perspective on branch architecture performance
Proceedings of the 28th annual international symposium on Microarchitecture
The measured performance of personal computer operating systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SHRIMP performance monitor: design and applications
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
ACM SIGARCH Computer Architecture News
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance counters and state sharing annotations: a unified approach to thread locality
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Self-paging in the Nemesis operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache-optimal methods for bit-reversals
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
ACM SIGPLAN Notices
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Improving memory performance of sorting algorithms
Journal of Experimental Algorithmics (JEA)
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Computers
IEEE Transactions on Computers
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC
IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Improving the Data Cache Performance of Multiprocessor Operating Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Real time aspects of cluster based caches
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Locality-Aware Process Scheduling for Embedded MPSoCs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Cache conflict resolution through detection, analysis and dynamic remapping of active pages
ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Impulse: Memory system support for scientific applications
Scientific Programming
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Cache-aware scheduling and analysis for multicores
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Enabling software management for multicore caches with a lightweight hardware support
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Data layout for cache performance on a multithreaded architecture
Transactions on high-performance embedded architectures and compilers III
Page coloring synchronization for improving cache performance in virtualization environment
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
Proceedings of the 2nd ACM Symposium on Cloud Computing
A hybrid hardware/software generated prefetching thread mechanism on chip multiprocessors
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Operating system support for multimedia systems
Computer Communications
Hi-index | 0.01 |
This paper describes a method for improving the performance of a large direct-mapped cache by reducing the number of conflict misses. Our solution consists of two components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that detects conflicts by recording and summarizing a history of cache misses, and a software policy within the operating system's virtual memory system that removes conflicts by dynamically remapping pages whenever large numbers of conflict misses are detected. Using trace-driven simulation of applications and the operating system, we show that a CML buffer enables a large direct-mapped cache to perform nearly as well as a two-way set associative cache of equivalent size and speed, although with lower hardware cost and complexity.