On-line caching as cache size varies
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Proceedings of the 27th annual international symposium on Computer architecture
Address space sparsity and fine granularity
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Database System Implementation
Database System Implementation
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Software prefetching and caching for translation lookaside buffers
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Accelerating two-dimensional page walks for virtualized systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Selective hardware/software memory virtualization
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Enhancing virtualized application performance through dynamic adaptive paging mode selection
Proceedings of the 8th ACM international conference on Autonomic computing
SpecTLB: a mechanism for speculative address translation
Proceedings of the 38th annual international symposium on Computer architecture
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Revisiting hardware-assisted page walks for virtualized systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A multi-core memory organization for 3-d DRAM as main memory
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers
Proceedings of the 40th Annual International Symposium on Computer Architecture
An optimized page translation for mobile virtualization
Proceedings of the 50th Annual Design Automation Conference
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Revisiting memory management on virtualized environments
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
This paper explores the design space of MMU caches that accelerate virtual-to-physical address translation in processor architectures, such as x86-64, that use a radix tree page table. In particular, these caches accelerate the page table walk that occurs after a miss in the Translation Lookaside Buffer. This paper shows that the most effective MMU caches are translation caches, which store partial translations and allow the page walk hardware to skip one or more levels of the page table. In recent years, both AMD and Intel processors have implemented MMU caches. However, their implementations are quite different and represent distinct points in the design space. This paper introduces three new MMU cache structures that round out the design space and directly compares the effectiveness of all five organizations. This comparison shows that two of the newly introduced structures, both of which are translation cache variants, are better than existing structures in many situations. Finally, this paper contributes to the age-old discourse concerning the relative effectiveness of different page table organizations. Generally speaking, earlier studies concluded that organizations based on hashing, such as the inverted page table, outperformed organizations based upon radix trees for supporting large virtual address spaces. However, these studies did not take into account the possibility of caching page table entries from the higher levels of the radix tree. This paper shows that any of the five MMU cache structures will reduce radix tree page table DRAM accesses far below an inverted page table.