An in-cache address translation mechanism
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Virtual memory, processes, and sharing in MULTICS
Communications of the ACM
Uniprocessor Virtual Memory without TLBs
IEEE Transactions on Computers
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Practical, transparent operating system support for superpages
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Memory resource management in VMware ESX server
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
A comparison of software and hardware techniques for x86 virtualization
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
General purpose operating system support for multiple page sizes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Virtual machine-provided context sensitive page mappings
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Accelerating two-dimensional page walks for virtualized systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Translation caching: skip, don't walk (the page table)
Proceedings of the 37th annual international symposium on Computer architecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Mnemosyne: lightweight persistent memory
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Communications of the ACM
SpecTLB: a mechanism for speculative address translation
Proceedings of the 38th annual international symposium on Computer architecture
Shared last-level TLBs for chip multiprocessors
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
ACM SIGARCH Computer Architecture News
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory reference energy with opportunistic virtual caching
Proceedings of the 39th Annual International Symposium on Computer Architecture
Revisiting hardware-assisted page walks for virtualized systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
Proceedings of the Third ACM Symposium on Cloud Computing
ACM Transactions on Architecture and Code Optimization (TACO)
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10% of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workloads use read-write permission on most pages, are provisioned not to swap, and rarely benefit from the full flexibility of page-based virtual memory. To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of the virtual address space. Direct segments use minimal hardware---base, limit and offset registers per core---to map contiguous virtual memory regions directly to contiguous physical memory. They eliminate the possibility of TLB misses for key data structures such as database buffer pools and in-memory key-value stores. Memory mapped by a direct segment may be converted back to paging when needed. We prototype direct-segment software support for x86-64 in Linux and emulate direct-segment hardware. For our workloads, direct segments eliminate almost all TLB misses and reduce the execution time wasted on TLB misses to less than 0.5%.