Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

Authors:
Rui Min;Yiming Hu
Affiliations:
Univ. of Cincinnati, Cincinnatii, OH;Univ. of Cincinnati, Cincinnatii, OH
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 26
Cited 5

Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Multiprocessor cache analysis using ATUM

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
Translation lookaside buffer consistency: a software approach

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Organization and performance of a two-level virtual-real cache hierarchy

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
MIPS RISC architectures

MIPS RISC architectures
Alpha architecture reference manual

Alpha architecture reference manual
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Eliminating the address translation bottleneck for physical address cache

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The effect of page allocation on caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Protection traps and alternatives for memory management of an object-oriented language

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The Rio file cache: surviving operating system crashes

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The design and performance of a conflict-avoiding cache

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Randomized Cache Placement for Eliminating Conflicts

IEEE Transactions on Computers - Special issue on cache memory and related problems
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Virtual-Address Caches, Part 2: Multiprocessor Issues

IEEE Micro
Multi-column implementations for cache associativity

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Page allocation to reduce access time of physical caches

Page allocation to reduce access time of physical caches

Code Placement with Selective Cache Activity Minimization for Embedded Real-time Software Design

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Cache index-aware memory allocation

Proceedings of the international symposium on Memory management

Quantified Score

Hi-index	14.98

Visualization

Abstract

Modern CPUs often use large physically indexed caches that are direct-mapped or have low associativities. Such caches do not interact well with virtual memory systems. An improperly placed physical page will end up in a wrong place in the cache, causing excessive conflicts with other cached pages. Page coloring has been proposed to reduce the conflict misses by carefully placing pages in the physical memory. While page coloring works well for some applications, many factors limit its performance. Page coloring limits the freedom of the page placement system and may increase swapping traffic. In this paper, we propose a novel and simple architecture, called color-indexed, physically tagged caches, which can significantly reduce the conflict misses. With some simple modifications to the TLB (Translation Look-aside Buffer), the new architecture decouples the addresses of the cache from the addresses of the main memory. Since the cache addresses do not depend on the the physical memory addresses anymore, the system can freely place data in any cache page to minimize the conflict misses, without affecting the paging system. Extensive trace-driven simulation results show that our design performs much better than traditional page coloring techniques. The new scheme enables a direct-mapped cache to achieve hit ratios very close to or better than those of a two-way set associative cache. Moreover, the architecture does not increase cache access latency, which is a drawback of set associative caches. The hardware overhead is minimal. We show that our scheme can reduce the cache size by 50 percent without sacrificing performance. A two-way set-associative cache that uses this strategy can perform very close to a fully associative cache.