Software prefetching and caching for translation lookaside buffers

Authors:
Kavita Bala;M. Frans Kaashoek;William E. Weihl
Affiliations:
MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA
Venue:
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Year:
1994

Citing 17
Cited 4

Lightweight remote procedure call

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Modern operating systems

Modern operating systems
Lightweight shared objects in a 64-bit operating system

OOPSLA '92 conference proceedings on Object-oriented programming systems, languages, and applications
Design tradeoffs for software-managed TLBs

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Architectural support for translation table management in large address space machines

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The impact of operating system structure on memory system performance

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Improving IPC by kernel design

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Efficient software-based fault isolation

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Computer organization & design: the hardware/software interface

Computer organization & design: the hardware/software interface
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The operating system kernel as a secure programmable machine

EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Microsoft OLE 2 Programmer's Reference

Microsoft OLE 2 Programmer's Reference
The increasing irrelevance of IPC Performance for Micro-kernel-Based Operating Systems

Proceedings of the Workshop on Micro-kernels and Other Kernel Architectures
Programming Under Mach (UNIX & Open Systems Series)

Programming Under Mach (UNIX & Open Systems Series)

Guarded page tables on Mips R4600 or an exercise in architecture-dependent micro optimization

ACM SIGOPS Operating Systems Review
Translation caching: skip, don't walk (the page table)

Proceedings of the 37th annual international symposium on Computer architecture
StorkCloud: data transfer scheduling and optimization as a service

Proceedings of the 4th ACM workshop on Scientific cloud computing
Network-aware data caching and prefetching for cloud-hosted metadata retrieval

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of interacting trends in operating system structure, processor architecture, and memory systems are increasing both the rate of translation lookaside buffer (TLB) misses and the cost of servicing amiss. This paper presents two novel software schemes, implemented under Mach 3.0, to decrease both the number and the cost of kernel TLB misses (i.e., misses on kernel data structures, including user page tables). The first scheme is a new use of prefetching for TLB entries on the IPC path, and the second scheme is a new use of software caching of TLB entries for hierarchical page table organizations. For a range of applications, prefetching decreases the number of kernel TLB misses by 40% to 50%, and caching decreases TLB penalties by providing a fast path for over 90% of the misses. Our caching scheme also decreases the number of nested TLB traps due to the page table hierarchy, reducing the number of kernel TLB miss traps for applications by 20% to 40%. Prefetching and caching, when used alone, each improve application performance by up to 3.5%; when used together, they improve application performance by up to 3%. On synthetic benchmarks that involve frequent communication among several different address spaces (and thus put more pressure on the TLB), prefetching improves overall performance by about 6%, caching improves overall performance by about 10%, and the two used together improve overall performance by about 12%. Our techniques are very effective in reducing kernel TLB penalties, which currently range from 1% to 5% of application runtime for the benchmarks studied. Since processor speeds continue to increase relative to memory speeds, our schemes should be even more effective in improving application performance in future architectures.