The TLB slice—a low-cost high-speed address translation mechanism

Authors:
George Taylor;Peter Davies;Michael Farmwald
Affiliations:
MIPS Computer Systems, 930 Arques Avenue, Sunnyvale, CA;MIPS Computer Systems, 930 Arques Avenue, Sunnyvale, CA;MIPS Computer Systems, 930 Arques Avenue, Sunnyvale, CA
Venue:
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Year:
1990

Citing 10
Cited 29

Design Decisions in SPUR

Computer
An in-cache address translation mechanism

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Coherency for multiprocessor virtual address caches

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A simulation study of two-level caches

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Precision Architecture

Computer
MIPS RISC architecture

MIPS RISC architecture
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Virtual Memory

ACM Computing Surveys (CSUR)
Cache Memories

ACM Computing Surveys (CSUR)

Implementing a cache for a high-performance GaAs microprocessor

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Eliminating the address translation bottleneck for physical address cache

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Translation hint buffers to reduce access time of physically-addressed instruction caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The effect of page allocation on caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
The influence of caches on the performance of sorting

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

IEEE Transactions on Computers
Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors

IEEE Micro
Virtual-Address Caches, Part 2: Multiprocessor Issues

IEEE Micro
A Quantitative Evaluation of Cache Types for High-Performance Computer Systems

IEEE Transactions on Computers
U-cache: a cost-effective solution to synonym problem

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A way-halting cache for low-energy high-performance systems

Proceedings of the 2004 international symposium on Low power electronics and design
A way-halting cache for low-energy high-performance systems

ACM Transactions on Architecture and Code Optimization (TACO)
Lookahead page placement

ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
MCC-DB: minimizing cache conflicts in multi-core processors for databases

Proceedings of the VLDB Endowment
Page coloring synchronization for improving cache performance in virtualization environment

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
W-Order scan: minimizing cache pollution by application software level cache management for MMDB

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Code-based cache partitioning for improving hardware cache performance

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Efficiently combining parallel software using fine-grained, language-level, hierarchical resource management policies

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Impact of resource sharing on performance and performance prediction: a survey

CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

The MIPS R6000 microprocessor relies on a new type of translation lookaside buffer — called a TLB slice — which is less than one-tenth the size of a conventional TLB and as fast as one multiplexer delay, yet has a high enough hit rate to be practical. The fast translation makes it possible to use a physical cache without adding a translation stage to the processor's pipeline. The small size makes it possible to include address translation on-chip, even in a technology with a limited number of devices.The key idea behind the TLB slice is to have both a virtual tag and a physical tag on a physically-indexed cache. Because of the virtual tag, the TLB slice needs to hold only enough physical page number bits — typically 4 to 8 — to complete the physical cache index, in contrast with a conventional TLB, which needs to hold both a virtual page number and a physical page number. The virtual page number is unnecessary because the TLB slice needs to provide only a hint for the translated physical address rather than a guarantee. The full physical page number is unnecessary because the cache hit logic is based on the virtual tag. Furthermore, if the cache is multi-level and references to the TLB slice are “shielded” by hits in a virtually indexed primary cache, the slice can get by with very few entries, once again lowering its cost and increasing its speed. With this mechanism, the simplicity of a physical cache can been combined with the speed of a virtual cache.