Translation caching: skip, don't walk (the page table)

Authors:
Thomas W. Barr;Alan L. Cox;Scott Rixner
Affiliations:
Rice University, Houston, TX, USA;Rice University, Houston, TX, USA;Rice University, Houston, TX, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 15
Cited 14

On-line caching as cache size varies

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Address space sparsity and fine granularity

EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Database System Implementation

Database System Implementation
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Software prefetching and caching for translation lookaside buffers

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Accelerating two-dimensional page walks for virtualized systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors

ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems

Selective hardware/software memory virtualization

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Enhancing virtualized application performance through dynamic adaptive paging mode selection

Proceedings of the 8th ACM international conference on Autonomic computing
SpecTLB: a mechanism for speculative address translation

Proceedings of the 38th annual international symposium on Computer architecture
A comparison of the use of virtual versus physical snapshots for supporting update-intensive workloads

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Analytical bounds for optimal tile size selection

CC'12 Proceedings of the 21st international conference on Compiler Construction
Revisiting hardware-assisted page walks for virtualized systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A multi-core memory organization for 3-d DRAM as main memory

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
CoLT: Coalesced Large-Reach TLBs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers

Proceedings of the 40th Annual International Symposium on Computer Architecture
An optimized page translation for mobile virtualization

Proceedings of the 50th Annual Design Automation Conference
Large-reach memory management unit caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Revisiting memory management on virtualized environments

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores the design space of MMU caches that accelerate virtual-to-physical address translation in processor architectures, such as x86-64, that use a radix tree page table. In particular, these caches accelerate the page table walk that occurs after a miss in the Translation Lookaside Buffer. This paper shows that the most effective MMU caches are translation caches, which store partial translations and allow the page walk hardware to skip one or more levels of the page table. In recent years, both AMD and Intel processors have implemented MMU caches. However, their implementations are quite different and represent distinct points in the design space. This paper introduces three new MMU cache structures that round out the design space and directly compares the effectiveness of all five organizations. This comparison shows that two of the newly introduced structures, both of which are translation cache variants, are better than existing structures in many situations. Finally, this paper contributes to the age-old discourse concerning the relative effectiveness of different page table organizations. Generally speaking, earlier studies concluded that organizations based on hashing, such as the inverted page table, outperformed organizations based upon radix trees for supporting large virtual address spaces. However, these studies did not take into account the possibility of caching page table entries from the higher levels of the radix tree. This paper shows that any of the five MMU cache structures will reduce radix tree page table DRAM accesses far below an inverted page table.