Surpassing the TLB performance of superpages with less operating system support

Authors:
Madhusudhan Talluri;Mark D. Hill
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison, WI;Computer Sciences Department, University of Wisconsin, Madison, WI
Venue:
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Year:
1994

Citing 25
Cited 52

A fast file system for UNIX

ACM Transactions on Computer Systems (TOCS)
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Adaptive storage management for very large virtual/real storage systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The SPARC architecture manual: version 8

The SPARC architecture manual: version 8
MIPS RISC architectures

MIPS RISC architectures
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
A class of replacement policies for medium and high-associativity structures

ACM SIGARCH Computer Architecture News
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alpha AXP architecture

Communications of the ACM
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementation of the CORAL deductive database system

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Virtual Memory

ACM Computing Surveys (CSUR)
Cache Memories

ACM Computing Surveys (CSUR)
Buddy systems

Communications of the ACM
Microprocessor Memory Management Units

IEEE Micro
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Experimental evaluation of on-chip microprocessor cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The multics system: an examination of its structure

The multics system: an examination of its structure

Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
High-bandwidth address translation for multiple-issue processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Reducing network latency using subpages in a global memory environment

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trap-driven memory simulation with Tapeworm II

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Increasing TLB reach using superpages backed by shadow memory

Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs

Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Boosting superpage utilization with the shadow memory and the partial-subblock TLB

Proceedings of the 14th international conference on Supercomputing
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
FLASH vs. (simulated) FLASH: closing the simulation loop

ACM SIGPLAN Notices
Uniprocessor Virtual Memory without TLBs

IEEE Transactions on Computers
FLASH vs. (Simulated) FLASH: closing the simulation loop

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Impulse Memory Controller

IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Avoiding initialization misses to the heap

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Trace-Driven Memory Simulation: A Survey

Performance Evaluation: Origins and Directions
A transparent Linux super page kernel for Alpha, Sparc64 and IA32: reducing TLB misses of applications

ACM SIGARCH Computer Architecture News
A banked-promotion translation lookaside buffer system

Journal of Systems Architecture: the EUROMICRO Journal
Practical, transparent operating system support for superpages

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Virtual memory on data diffusion architectures

Parallel Computing
Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB

IEEE Transactions on Computers
Practical, transparent operating system support for superpages

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Advanced non-distributed operating systems course

ACM SIGCSE Bulletin
Multiple Page Size Modeling and Optimization

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Deconstructing process isolation

Proceedings of the 2006 workshop on Memory system performance and correctness
A comprehensive study of hardware/software approaches to improve TLB performance for java applications on embedded systems

Proceedings of the 2006 workshop on Memory system performance and correctness
General purpose operating system support for multiple page sizes

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Implementation of multiple pagesize support in HP-UX

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Supporting superpage allocation without additional hardware support

Proceedings of the 7th international symposium on Memory management
Architectural support for shadow memory in multiprocessors

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Runtime monitoring on multicores via OASES

ACM SIGOPS Operating Systems Review
A case for compiler-driven superpage allocation

Proceedings of the 47th Annual Southeast Regional Conference
Using 4KB page size for virtual memory is obsolete

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Translation caching: skip, don't walk (the page table)

Proceedings of the 37th annual international symposium on Computer architecture
IBM system z10 support for large pages

IBM Journal of Research and Development
A study of Java's non-Java memory

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SpecTLB: a mechanism for speculative address translation

Proceedings of the 38th annual international symposium on Computer architecture
CoLT: Coalesced Large-Reach TLBs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers

Proceedings of the 40th Annual International Symposium on Computer Architecture
Large-reach memory management unit caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Revisiting memory management on virtualized environments

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Many commercial microprocessor architectures have added translation lookaside buffer (TLB) support for superpages. Superpages differ from segments because their size must be a power of two multiple of the base page size and they must be aligned in both virtual and physical address spaces. Very large superpages (e.g., 1MB) are clearly useful for mapping special structures, such as kernel data or frame buffers. This paper considers the architectural and operating system support required to exploit medium-sized superpages (e.g., 64KB, i.e., sixteen times a 4KB base page size). First, we show that superpages improve TLB performance only after invasive operating system modifications that introduce considerable overhead.We then propose two subblock TLB designs as alternate ways to improve TLB performance. Analogous to a subblock cache, a complete-subblock TLB associates a tag with a superpage-sized region but has valid bits, physical page number, attributes, etc., for each possible base page mapping. A partial-subblock TLB entry is much smaller than a complete-subblock TLB entry, because it shares physical page number and attribute fields across base page mappings. A drawback of a partial-subblock TLB is that base page mappings can share a TLB entry only if they map to consecutive physical pages and have the same attributes. We propose a physical memory allocation algorithm, page reservation, that makes this sharing more likely. When page reservation is used, experimental results show partial-subblock TLBs perform better than superpage TLBs, while requiring simpler operating system changes. If operating system changes are inappropriate, however, complete-subblock TLBs perform best.