High-bandwidth address translation for multiple-issue processors

Authors:
Todd M. Austin;Gurindar S. Sohi
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 W. Dayton Street, Madison, WI;Computer Sciences Department, University of Wisconsin-Madison, 1210 W. Dayton Street, Madison, WI
Venue:
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Year:
1996

Citing 21
Cited 15

Principles of CMOS VLSI design: a systems perspective

Principles of CMOS VLSI design: a systems perspective
Design Decisions in SPUR

Computer
How many addressing modes are enough?

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Translation lookaside buffer consistency: a software approach

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Organization and performance of a two-level virtual-real cache hierarchy

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Computer programming and architecture: The VAX

Computer programming and architecture: The VAX
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
MIPS RISC architectures

MIPS RISC architectures
A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Eliminating the address translation bottleneck for physical address cache

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Architecture support for single address space operating systems

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Translation hint buffers to reduce access time of physically-addressed instruction caches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Request Combining in Multiprocessors with Arbitrary Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Tradeoffs in two-level on-chip caching

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
AS/400TM 64-bit PowerPCTM-Compatible Processor Implementaiton

ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors

Reducing TLB power requirements

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Data caches for superscalar processors

ICS '97 Proceedings of the 11th international conference on Supercomputing
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Options for dynamic address translation in COMAs

Proceedings of the 25th annual international symposium on Computer architecture
Widening resources: a cost-effective technique for aggressive ILP architectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Recency-based TLB preloading

Proceedings of the 27th annual international symposium on Computer architecture
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A banked-promotion translation lookaside buffer system

Journal of Systems Architecture: the EUROMICRO Journal
A selective filter-bank TLB system

Proceedings of the 2003 international symposium on Low power electronics and design
Scalable cache memory design for large-scale SMT architectures

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Synonymous address compaction for energy reduction in data TLB

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In an effort to push the envelope of system performance, microprocessor designs are continually exploiting higher levels of instruction-level parallelism, resulting in increasing bandwidth demands on the address translation mechanism. Most current microprocessor designs meet this demand with a multi-ported TLB. While this design provides an excellent hit rate at each port, its access latency and area grow very quickly as the number of ports is increased. As bandwidth demands continue to increase, multi-ported designs will soon impact memory access latency.We present four high-bandwidth address translation mechanisms with latency and area characteristics that scale better than a multiported TLB design. We extend traditional high-bandwidth memory design techniques to address translation, developing interleaved and multi-level TLB designs. In addition, we introduce two new designs crafted specifically for high-bandwidth address translation. Piggyback ports are introduced as a technique to exploit spatial locality in simultaneous translation requests, allowing accesses to the same virtual memory page to combine their requests at the TLB access port. Pretranslation is introduced as a technique for attaching translations to base register values, making it possible to reuse a single translation many times.We perform extensive simulation-based studies to evaluate our designs. We vary key system parameters, such as processor model, page size, and number of architected registers, to see what effects these changes have on the relative merits of each approach. A number of designs show particular promise. Multi-level TLBs with as few as eight entries in the upper-level TLB nearly achieve the performance of a TLB with unlimited bandwidth. Piggyback ports combined with a lesser-ported TLB structure, e.g., an interleaved or multi-ported TLB, also perform well. Pretranslation over a single-ported TLB performs almost as well as a same-sized multi-level TLB with the added benefit of decreased access latency for physically indexed caches.