Abstracts: A Latency-Hiding Technique for High-Capacity

  • Authors:
  • Joel A. Fine;Thomas E. Anderson;Michael D. Dahlin;James Frew;Michael Olson;David A. Patterson

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • Abstracts: A Latency-Hiding Technique for High-Capacity
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extraordinary advances in digital storage technology are rapidly making possible cost-effective, multiple-terabyte information retrieval systems. The latency and bandwidth of these technologies are typically much worse than what users of computer systems are accustomed to. Unfortunately, traditional techniques of reducing latency and improving bandwidth, caching and compression, by themselves will not work well with the access patterns that we anticipate for these high-capacity systems. We introduce and define a new storage management technique, called abstracts. An abstract is an extraction of the "essential" part of the data set. It is created using some combination of averaging, subsetting, rounding, or some other method of condensing the data. An abstract''s composition is heavily dependent on the context in which it is used. Each data set can have multiple abstracts associated with it, each of which can be used to answer a query from an abstract, effective bandwidth increases, because we transfer much less data through the storage system. The counter-intuitive result is that abstracts on robot-based tape storage systems can have lower latency than full data sets on magnetic disks, because the inherent latency disadvantage of tertiary systems can be overcome by the reduction in transfer time due to the smaller transfer size. Moreover, because many abstracts can fit in faster storage in the space occupied by a single unabstracted data set, users can get the effect of magnetic disk latencies for very large objects. To evaluate the potential of abstracts, we examine four common queries as well as a detailed case study. We also study the statistical characteristics of several data sets in an effort to identify classes of abstracting functions.