Abstracts: A Latency-Hiding Technique for High-Capacity

Authors:
Joel A. Fine;Thomas E. Anderson;Michael D. Dahlin;James Frew;Michael Olson;David A. Patterson
Affiliations:
-;-;-;-;-;-
Venue:
Abstracts: A Latency-Hiding Technique for High-Capacity
Year:
1992

Citing 0
Cited 2

Query Pre-Execution and Batching in Paradise: A Two-Pronged Approach to the Efficient Processing of Queries on Tape-Resident Raster Images

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
The ATREE: A Data Structure to Support Very Large Scientific Databases

ISD '99 Selected Papers from the International Workshop on Integrated Spatial Databases, Digital Inages and GIS

Quantified Score

Hi-index	0.00

Visualization

Abstract

Extraordinary advances in digital storage technology are rapidly making possible cost-effective, multiple-terabyte information retrieval systems. The latency and bandwidth of these technologies are typically much worse than what users of computer systems are accustomed to. Unfortunately, traditional techniques of reducing latency and improving bandwidth, caching and compression, by themselves will not work well with the access patterns that we anticipate for these high-capacity systems. We introduce and define a new storage management technique, called abstracts. An abstract is an extraction of the "essential" part of the data set. It is created using some combination of averaging, subsetting, rounding, or some other method of condensing the data. An abstract''s composition is heavily dependent on the context in which it is used. Each data set can have multiple abstracts associated with it, each of which can be used to answer a query from an abstract, effective bandwidth increases, because we transfer much less data through the storage system. The counter-intuitive result is that abstracts on robot-based tape storage systems can have lower latency than full data sets on magnetic disks, because the inherent latency disadvantage of tertiary systems can be overcome by the reduction in transfer time due to the smaller transfer size. Moreover, because many abstracts can fit in faster storage in the space occupied by a single unabstracted data set, users can get the effect of magnetic disk latencies for very large objects. To evaluate the potential of abstracts, we examine four common queries as well as a detailed case study. We also study the statistical characteristics of several data sets in an effort to identify classes of abstracting functions.