Reducing network latency using subpages in a global memory environment

Authors:
Hervé A. Jamrozik;Michael J. Feeley;Geoffrey M. Voelker;James Evans, II;Anna R. Karlin;Henry M. Levy;Mary K. Vernon
Affiliations:
Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington
Venue:
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Year:
1996

Citing 17
Cited 12

File access performance of diskless workstations

ACM Transactions on Computer Systems (TOCS)
801 storage: architecture and programming

ACM Transactions on Computer Systems (TOCS)
Modula-3

Modula-3
Alpha architecture reference manual

Alpha architecture reference manual
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Tradeoffs in supporting two page sizes

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
High-speed switch scheduling for local-area networks

ACM Transactions on Computer Systems (TOCS)
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Separating data and control transfer in distributed operating systems

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Trap-driven simulation with Tapeworm II

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Early experience with message-passing on the SHRIMP multicomputer

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Fast rendering of complex environments using a spatial hierarchy

GI '96 Proceedings of the conference on Graphics interface '96
Global Memory Management in Client-Server Database Architectures

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Kernel Support for the Wisconsin Wind Tunnel

USENIX Microkernels and Other Kernel Architectures Symposium

Managing server load in global memory systems

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Modeling communication pipeline latency

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Implementing cooperative prefetching and caching in a globally-managed memory system

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Techniques for energy minimization of communication pipelines

Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
MultiView and Millipage — fine-grain sharing in page-based DSMs

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
An energy-efficient virtual memory system with flash memory as the secondary storage

Proceedings of the 2006 international symposium on Low power electronics and design
Cheating the I/O bottleneck: network storage with Trapeze/Myrinet

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Energy-aware flash memory management in virtual memory system

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Labeling schemes with queries

SIROCCO'07 Proceedings of the 14th international conference on Structural information and communication complexity
Adaptive memory system over ethernet

HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
A distributed paging RAM grid system for wide-area memory sharing

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

New high-speed networks greatly encourage the use of network memory as a cache for virtual memory and file pages, thereby reducing the need for disk access. Because pages are the fundamental transfer and access units in remote memory systems, page size is a key performance factor. Recently, page sizes of modern processors have been increasing in order to provide more TLB coverage and amortize disk access costs. Unfortunately, for high-speed networks, small transfers are needed to provide low latency. This trend in page size is thus at odds with the use of network memory on high-speed networks.This paper studies the use of subpages as a means of reducing transfer size and latency in a remote-memory environment. Using trace-driven simulation, we show how and why subpages reduce latency and improve performance of programs using network memory. Our results show that memory-intensive applications execute up to 1.8 times faster when executing with 1K-byte subpages, when compared to the same applications using full 8K-byte pages in the global memory system. Those same applications using 1K-byte subpages execute up to 4 times faster than they would using the disk for backing store. Using a prototype implementation on the DEC Alpha and AN2 network, we demonstrate how subpages can reduce remote-memory fault time; e.g., our prototype is able to satisfy a fault on a 1K subpage stored in remote memory in 0.5 milliseconds, one third the time of a full page.