Increasing TLB reach using superpages backed by shadow memory

Authors:
Mark Swanson;Leigh Stoller;John Carter
Affiliations:
Department of Computer Science, University of Utah, Salt Lake City, UT;Department of Computer Science, University of Utah, Salt Lake City, UT;Department of Computer Science, University of Utah, Salt Lake City, UT
Venue:
Proceedings of the 25th annual international symposium on Computer architecture
Year:
1998

Citing 13
Cited 25

A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Where is time spent in message-passing and shared-memory programs?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor

Digital Technical Journal - Special 10th anniversary issue
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
PA-RISC 2.0 architecture

PA-RISC 2.0 architecture
Studies of Windows NT performance using dynamic execution traces

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Access ordering and memory-conscious cache utilization

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

Optimizing the idle task and other MMU tricks

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Boosting superpage utilization with the shadow memory and the partial-subblock TLB

Proceedings of the 14th international conference on Supercomputing
Online superpage promotion revisited (poster session)

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Dynamic Access Ordering for Streamed Computations

IEEE Transactions on Computers
Concurrency, latency, or system overhead: which has the largest impact on uniprocessor DRAM-system performance?

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The Impulse Memory Controller

IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Precise Data Locality Optimization of Nested Loops

The Journal of Supercomputing
Data Sequence Locality: A Generalization of Temporal Locality

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Memory System Support for Dynamic Cache Line Assembly

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Memory System Support for Irregular Applications

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Virtual memory on data diffusion architectures

Parallel Computing
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB

IEEE Transactions on Computers
Efficient management for large-scale flash-memory storage systems with resource conservation

ACM Transactions on Storage (TOS)
Efficient address remapping in distributed shared-memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
Impulse: Memory system support for scientific applications

Scientific Programming
Supporting superpage allocation without additional hardware support

Proceedings of the 7th international symposium on Memory management
Architectural support for shadow memory in multiprocessors

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Runtime monitoring on multicores via OASES

ACM SIGOPS Operating Systems Review
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Enigma: architectural and operating system support for reducing the impact of address translation

Proceedings of the 24th ACM International Conference on Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The amount of memory that can be accessed without causing a TLB fault, the reach of a TLB, is failing to keep pace with the increasingly large working sets of applications. We propose to extend TLB reach via a novel Memory Controller TLB (MTLB) that lets us aggressively create superpages from non-contiguous, unaligned regions of physical memory. This flexibility increases the OS's ability to use superpages on arbitrary application data. The MTLB supports shadow pages, regions of physical address space for which the MTLB remaps accesses to "real" physical pages. The MTLB preserves per-base-page referenced and dirty bits, which enables the OS to swap shadow-backed superpages a page at a time, unlike conventional superpages. Simulation of five applications, including two SPECint95 benchmarks, demonstrated that a modest-sized MTLB improves performance of applications with moderate-to-high TLB miss rates by 5-20%. Simulation also showed that this mechanism can more than double the effective reach of a processor TLB with no modification to the processor MMU.