The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design tradeoffs for software-managed TLBs
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Surpassing the TLB performance of superpages with less operating system support
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Performance of the VAX-11/780 translation buffer: simulation and measurement
ACM Transactions on Computer Systems (TOCS)
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs
Proceedings of the 25th annual international symposium on Computer architecture
A look at several memory management units, TLB-refill mechanisms, and page table organizations
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Proceedings of the 27th annual international symposium on Computer architecture
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Software-Managed Address Translation
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Use of superpages and subblocking in the address translation hierarchy
Use of superpages and subblocking in the address translation hierarchy
IEEE Transactions on Computers
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Translation caching: skip, don't walk (the page table)
Proceedings of the 37th annual international symposium on Computer architecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
SpecTLB: a mechanism for speculative address translation
Proceedings of the 38th annual international symposium on Computer architecture
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
Revisiting hardware-assisted page walks for virtualized systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Cache isolation for virtualization of mixed general-purpose and real-time systems
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Translation Lookaside Buffers (TLBs) are commonly employed in modern processor designs and have considerable impact on overall system performance. A number of past works have studied TLB designs to lower access times and miss rates, specifically for uniprocessors. With the growing dominance of chip multiprocessors (CMPs), it is necessary to examine TLB performance in the context of parallel workloads. This work is the first to present TLB prefetchers that exploit commonality in TLB miss patterns across cores in CMPs. We propose and evaluate two Inter-Core Cooperative (ICC) TLB prefetching mechanisms, assessing their effectiveness at eliminating TLB misses both individually and together. Our results show these approaches require at most modest hardware and can collectively eliminate 19% to 90% of data TLB (D-TLB) misses across the surveyed parallel workloads. We also compare performance improvements across a range of hardware and software implementation possibilities. We find that while a fully-hardware implementation results in average performance improvements of 8-46% for a range of TLB sizes, a hardware/software approach yields improvements of 4-32%. Overall, our work shows that TLB prefetchers exploiting inter-core correlations can effectively eliminate TLB misses.