Application buffer-cache management for performance: running the world's largest MRTG

Authors:
David Plonka;Archit Gupta;Dale Carder
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison;University of Wisconsin-Madison
Venue:
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Year:
2007

Citing 10
Cited 1

Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Operating system support for database management

Communications of the ACM
Pilot: an operating system for a personal computer

Communications of the ACM
MRTG: The Multi Router Traffic Grapher

LISA '98 Proceedings of the 12th Conference on Systems Administration
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Exploiting Gray-Box Knowledge of Buffer-Cache Management

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Transforming policies into mechanisms with infokernel

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Understanding the Linux Virtual Memory Manager

Understanding the Linux Virtual Memory Manager
RTG: A Scalable SNMP Statistics Architecture for Service Providers

LISA '02 Proceedings of the 16th USENIX conference on System administration
Driving by the rear-view mirror: managing a network with cricket

NETA'99 Proceedings of the 1st conference on Conference on Network Administration - Volume 1

dsync: efficient block-wise synchronization of multi-gigabyte binary data

LISA'13 Proceedings of the 27th international conference on Large Installation System Administration

Quantified Score

Hi-index	0.00

Visualization

Abstract

An operating system's readahead and buffer-cache behaviors can significantly impact application performance; most often these better performance, but occasionally they worsen it. To avoid unintended I/O latencies, many database systems sidestep these OS features by minimizing or eliminating application file I/O. However, network traffic measurement applications are commonly built instead atop a high-performance file-based database: the Round Robin Database (RRD) Tool. While RRD is successful, experience has led the network operations community to believe that its scalability is limited to tens of thousands of, or perhaps one hundred thousand, RRD files on a single system, keeping it from being used to measure the largest managed networks today. We identify the bottleneck responsible for that experience and present two approaches to overcome it. In this paper, we provide a method and tools to expose the readahead and buffer-cache behaviors that are otherwise hidden from the user. We apply our method to a very large network traffic measurement system that experiences scalability problems and determine the performance bottleneck to be unnecessary disk reads, and page faults, due to the default readahead behavior. We develop both a simulation and an analytical model of the performance-limiting page fault rate for RRD file updates. We develop and evaluate two approaches that alleviate this problem: application advice to disable readahead and application-level caching. We demonstrate their effectiveness by configuring and operating the world's largest Multi-Router Traffic Grapher (MRTG), with approximately 320,000 RRD files, and over half a million data points measured every five minutes. Conservatively, our techniques approximately triple the capacity of very large MRTG and other RRD-based measurement systems.