A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Consistency management for virtually indexed caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Architectural support for translation table management in large address space machines
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Sharing and protection in a single-address-space operating system
ACM Transactions on Computer Systems (TOCS) - Special issue on computer architecture
A software-controlled prefetching mechanism for software-managed TLBs
Microprocessing and Microprogramming
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
High-bandwidth address translation for multiple-issue processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Options for dynamic address translation in COMAs
Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tolerating late memory traps in ILP processors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Software-Managed Address Translation
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB
IEEE Transactions on Computers
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Proceedings of the 2006 workshop on Memory system performance and correctness
Improving hash join performance through prefetching
ACM Transactions on Database Systems (TODS)
Path: page access tracking to improve memory management
Proceedings of the 6th international symposium on Memory management
Accelerating two-dimensional page walks for virtualized systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Context-aware TLB preloading for interference reduction in embedded multi-tasked systems
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Translation caching: skip, don't walk (the page table)
Proceedings of the 37th annual international symposium on Computer architecture
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SpecTLB: a mechanism for speculative address translation
Proceedings of the 38th annual international symposium on Computer architecture
ACM Transactions on Architecture and Code Optimization (TACO)
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Cache isolation for virtualization of mixed general-purpose and real-time systems
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.01 |
Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, TLB misses have become a serious bottleneck as working sets are growing beyond the capacity of TLBs.This work presents one of the first attempts to hide TLB miss latency by using preloading techniques. We present results for traditional next-page TLB miss preloading - an approach shown to cut some of the misses. However, a key contribution of this work is a novel TLB miss prediction algorithm based on the concept of “recency”, and we show that it can predict over 55% of the TLB misses for the five commercial applications considered.