A simulation based study of TLB performance

Authors:
J. Bradley Chen;Anita Borg;Norman P. Jouppi
Affiliations:
-;-;-
Venue:
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Year:
1992

Citing 8
Cited 37

An in-cache address translation mechanism

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Measuring VAX 8800 performance with a histogram hardware monitor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Architectural and organizational tradeoffs in the design of the MultiTitan CPU

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Performance of the VAX-11/780 translation buffer: simulation and measurement

ACM Transactions on Computer Systems (TOCS)
Cache Memories

ACM Computing Surveys (CSUR)
The Mips R4000 Processor

IEEE Micro
Design and Evaluation of In-Cache Address Translation

Design and Evaluation of In-Cache Address Translation

Translation hint buffers to reduce access time of physically-addressed instruction caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Design tradeoffs for software-managed TLBs

ACM Transactions on Computer Systems (TOCS)
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
High-bandwidth address translation for multiple-issue processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Reducing TLB power requirements

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Increasing TLB reach using superpages backed by shadow memory

Proceedings of the 25th annual international symposium on Computer architecture
Options for dynamic address translation in COMAs

Proceedings of the 25th annual international symposium on Computer architecture
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A look at several memory management units, TLB-refill mechanisms, and page table organizations

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Boosting superpage utilization with the shadow memory and the partial-subblock TLB

Proceedings of the 14th international conference on Supercomputing
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Source-to-Source Instrumentation for the Optimization of an Automatic Reading System

The Journal of Supercomputing
Uniprocessor Virtual Memory without TLBs

IEEE Transactions on Computers
The Impulse Memory Controller

IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Cool-Mem: combining statically speculative memory accessing with selective address translation for energy efficiency

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Tradeoffs in the Design of Single Chip Multiprocessors

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Reducing translation lookaside buffer active power

Proceedings of the 2003 international symposium on Low power electronics and design
Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning

Proceedings of the 2003 international symposium on Low power electronics and design
Coupling compiler-enabled and conventional memory accessing for energy efficiency

ACM Transactions on Computer Systems (TOCS)
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An energy efficient TLB design methodology

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Deconstructing process isolation

Proceedings of the 2006 workshop on Memory system performance and correctness
A comprehensive study of hardware/software approaches to improve TLB performance for java applications on embedded systems

Proceedings of the 2006 workshop on Memory system performance and correctness
Memory behavior of an X11 window system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
A caching model of operating system kernel functionality

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Implementation of multiple pagesize support in HP-UX

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Inter-core cooperative TLB for chip multiprocessors

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
CoLT: Coalesced Large-Reach TLBs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient virtual memory for big memory servers

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents the results of a simulation-based study of various translation lookaside buffer (TLB) architectures, in the context of a modern VLSI RISC processor. The simulators used address traces, generated by instrumented versions of the SPEC marks and several other programs running on a DECstation 5000. The performance of two-level TLBs and fully-associative TLBs were investigated. The amount of memory mapped was found to be the dominant factor in TLB performance. Small first-level FIFO instruction TLBs can be effective in two level TLB configurations. For some applications, the cyles-per-instruction (CPI) loss due to TLB misses can be reduced from as much as 5CPI to negligible levels with typical TLB parameters through the use of variable-sized pages.