DataSeries: an efficient, flexible data format for structured serial data

  • Authors:
  • Eric Anderson;Martin Arlitt;Charles B. Morrey, III;Alistair Veitch

  • Affiliations:
  • HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA;HP Labs, Palo Alto, CA

  • Venue:
  • ACM SIGOPS Operating Systems Review
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structured serial data is used in many scientific fields; such data sets consist of a series of records, and are typically written once, read many times, chronologically ordered, and read sequentially. In this paper we introduce DataSeries, an on-disk format, run-time library and set of tools for storing and analyzing structured serial data. We identify six key properties of a system to store and analyze this type of data, and describe how DataSeries was designed to provide these properties. We quantify the benefits of DataSeries through several experiments. In particular, we demonstrate that DataSeries exceeds the performance of common trace formats by at least a factor of two.