Clustered multiattribute hash files

  • Authors:
  • D. Rotem

  • Affiliations:
  • Lawrence Berkeley Laboratory, Univenity of California Berkeley, Berkeley, Ca

  • Venue:
  • PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

Access methods for multidimensional data have attracted much research interest in recent years. In general, the data structures proposed for this problem partition the database into a set of disk pages (buckets). Access to the buckets is provided by searching a directory of some type such as a tree directory or inverted index or by computation of a multiattribute hash function. Examples of the first approach are Multidimensional B-trees[Sch82], K-D-B trees[Rob81] (see also [Sam84] for a survey of these methods) whereas multiattribute hashing methods are described for example in [Rot74],[Aho79],[Riv76] and [Ram83]. In addition, there are also hybrid methods which combine hashing with a directory of some type [Ore84],[Nie84], [Fag79].In all the work mentioned above, the performance is measured in terms of the number of disk accesses made to retrieve the answer without distinguishing whether these are sequential or random. We argue that performance measurements must consider this factor in order to be realistic, especially in the single user environment. Some evidence to support this claim is given in [Sal88, pg. 22] with the IBM 3380 disk drive as an example. For this type of disk, a comparison between accessing m blocks randomly and accessing a contiguous cluster of m blocks is made. The results show that for m = 10, the random access is slower by a factor of about 8 than the clustered one whereas for m = 100 it is slower by a factor of 25.Another motivation for this work are optical disks. In this case, there is a big advantage in clustering since the access mechanism on many of these drives is equipped with an adjustable mirror which allows slight deflections of the laser beam. This means that it may be possible to read a complete cluster from a sequence of adjacent tracks beneath the head with a single random seek [Chri88].Our work is inspired by an interesting recent paper [Fal86] which proposes to organize the physical layout of a multiattribute hash file by encoding record signatures using gray code rather than simple binary code. In this way neighboring buckets contain records which differ on a single bit in their signatures. It is then proved that the records which form the answer to a partial match query will tend to be contained in a smaller number of clusters as compared with the binary arrangement. It is also shown that this idea is applicable to many other multiattribute hashing schemes with a small amount of overhead. In addition, it can improve access time to directories of grid type files, extendible hashing and file methods which employ the z-ordering [Ore84].