PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs

Authors:
Yong Li;Rami Melhem;Alex K. Jones
Affiliations:
University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Year:
2013

Citing 18
Cited 0

Translation lookaside buffer consistency: a software approach

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Splash 2

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Design tradeoffs for software-managed TLBs

ACM Transactions on Computer Systems (TOCS)
The impact of architectural trends on operating system performance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Performance of the VAX-11/780 translation buffer: simulation and measurement

Readings in computer architecture
Simics: A Full System Simulation Platform

Computer
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches

IEEE Micro
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Translation caching: skip, don't walk (the page table)

Proceedings of the 37th annual international symposium on Computer architecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Shared last-level TLBs for chip multiprocessors

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
On the Performance of Tagged Translation Lookaside Buffers: A Simulation-Driven Analysis

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traversing the page table during virtual to physical address translation causes pipeline stalls when misses occur in the translation-lookaside buffer (TLB). State-of-the-art translation proposals typically optimize a single aspect of translation performance (e.g., translation sharing, context switch performance, etc.) with potential trade-offs of additional hardware complexity, increased translation latency, or reduced scalability. In this article, we propose the partial sharing TLB (PS-TLB), a fast and scalable solution that reduces off-chip translation misses without sacrificing the timing-critical requirement of on-chip translation. We introduce the partial sharing buffer (PSB) which leverages application page sharing characteristics using minimal additional hardware resources. Compared to the leading TLB proposal that leverages sharing, PS-TLB provides a more than 45% improvement in translation latency with a 9% application speedup while using fewer storage resources. In addition, the page classification and PS-TLB architecture provide further optimizations including an over 30% reduction of interprocessor interrupts for coherence, and reduced context switch misses with fewer resources compared with existing methods.