A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Architectural support for translation table management in large address space machines
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Reducing TLB and memory overhead using online superpage promotion
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
PA-RISC 2.0 architecture
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing the idle task and other MMU tricks
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Tolerating late memory traps in ILP processors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Boosting superpage utilization with the shadow memory and the partial-subblock TLB
Proceedings of the 14th international conference on Supercomputing
Online superpage promotion revisited (poster session)
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Data Sequence Locality: A Generalization of Temporal Locality
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Memory System Support for Dynamic Cache Line Assembly
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Memory System Support for Irregular Applications
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Virtual memory on data diffusion architectures
Parallel Computing
Tolerating Late Memory Traps in Dynamically Scheduled Processors
IEEE Transactions on Computers
Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB
IEEE Transactions on Computers
Efficient management for large-scale flash-memory storage systems with resource conservation
ACM Transactions on Storage (TOS)
Efficient address remapping in distributed shared-memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Impulse: Memory system support for scientific applications
Scientific Programming
Supporting superpage allocation without additional hardware support
Proceedings of the 7th international symposium on Memory management
Architectural support for shadow memory in multiprocessors
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Runtime monitoring on multicores via OASES
ACM SIGOPS Operating Systems Review
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Hi-index | 0.01 |
The amount of memory that can be accessed without causing a TLB fault, the reach of a TLB, is failing to keep pace with the increasingly large working sets of applications. We propose to extend TLB reach via a novel Memory Controller TLB (MTLB) that lets us aggressively create superpages from non-contiguous, unaligned regions of physical memory. This flexibility increases the OS's ability to use superpages on arbitrary application data. The MTLB supports shadow pages, regions of physical address space for which the MTLB remaps accesses to "real" physical pages. The MTLB preserves per-base-page referenced and dirty bits, which enables the OS to swap shadow-backed superpages a page at a time, unlike conventional superpages. Simulation of five applications, including two SPECint95 benchmarks, demonstrated that a modest-sized MTLB improves performance of applications with moderate-to-high TLB miss rates by 5-20%. Simulation also showed that this mechanism can more than double the effective reach of a processor TLB with no modification to the processor MMU.