Disk-tape joins: synchronizing disk and tape access

  • Authors:
  • Jussi Myllymaki;Miron Livny

  • Affiliations:
  • Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison

  • Venue:
  • Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today large amounts of data are stored on tertiary storage media such as magnetic tapes and optical disks. DBMSs typically operate only on magnetic disks since they know how to maneuver disks and how to optimize accesses on them. Tertiary devices present a problem for DBMSs since these devices have dismountable media and have very different operational characteristics compared to magnetic disks. For instance, most tape drives offer very high capacity at low cost but are accessed sequentially, involve lengthy latencies, and deliver lower bandwidth. Typically, the scope of a DBMS's query optimizer does not include tertiary devices, and the DBMS might not even know how to control and operate upon tertiary-resident data. In a three-level hierarchy of storage devices (main memory, disk, tape), the typical solution is to elevate tape-resident data to disk devices, thus bringing such data into the DBMS' control, and then to perform the required operations on disk. This requires additional space on disk and may not give the lowest response time possible. With this challenge in mind, we studied the trade-offs between memory and disk requirements and the execution time of a join with the help of two well-known join methods. The conventional, disk-based Nested Block Join and Hybrid Hash Join were modified to operate directly on tapes. An experimental implementation of the modified algorithms gave us more insight into how the algorithms perform in practice. Our performance analysis shows that a DBMS desiring to operate on tertiary storage will benefit from special algorithms that operate directly on tape-resident data and take into account and exploit the mismatch in disk and tape characteristics.