Clustered multiattribute hash files

Authors:
D. Rotem
Affiliations:
Lawrence Berkeley Laboratory, Univenity of California Berkeley, Berkeley, Ca
Venue:
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Year:
1989

Citing 14
Cited 0

Multiattribute hashing using Gray codes

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
File structures: an analytic approach

File structures: an analytic approach
Performance analysis and fundamental performance tradeoffs for CLV optical disks

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
On the complexity of designing optimal partial-match retrieval systems

ACM Transactions on Database Systems (TODS)
Partial-match retrieval using hashing and descriptors

ACM Transactions on Database Systems (TODS)
Optimal partial-match retrieval when fields are independently specified

ACM Transactions on Database Systems (TODS)
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Optimality Properties of Multiple-Key Hashing Functions

Journal of the ACM (JACM)
The Quadtree and Related Hierarchical Data Structures

ACM Computing Surveys (CSUR)
Attribute based file organization in a paged memory environment

Communications of the ACM
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
A class of data structures for associative searching

PODS '84 Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems
Combinatorial Algorithms: Theory and Practice

Combinatorial Algorithms: Theory and Practice

Quantified Score

Hi-index	0.00

Visualization

Abstract

Access methods for multidimensional data have attracted much research interest in recent years. In general, the data structures proposed for this problem partition the database into a set of disk pages (buckets). Access to the buckets is provided by searching a directory of some type such as a tree directory or inverted index or by computation of a multiattribute hash function. Examples of the first approach are Multidimensional B-trees[Sch82], K-D-B trees[Rob81] (see also [Sam84] for a survey of these methods) whereas multiattribute hashing methods are described for example in [Rot74],[Aho79],[Riv76] and [Ram83]. In addition, there are also hybrid methods which combine hashing with a directory of some type [Ore84],[Nie84], [Fag79].In all the work mentioned above, the performance is measured in terms of the number of disk accesses made to retrieve the answer without distinguishing whether these are sequential or random. We argue that performance measurements must consider this factor in order to be realistic, especially in the single user environment. Some evidence to support this claim is given in [Sal88, pg. 22] with the IBM 3380 disk drive as an example. For this type of disk, a comparison between accessing m blocks randomly and accessing a contiguous cluster of m blocks is made. The results show that for m = 10, the random access is slower by a factor of about 8 than the clustered one whereas for m = 100 it is slower by a factor of 25.Another motivation for this work are optical disks. In this case, there is a big advantage in clustering since the access mechanism on many of these drives is equipped with an adjustable mirror which allows slight deflections of the laser beam. This means that it may be possible to read a complete cluster from a sequence of adjacent tracks beneath the head with a single random seek [Chri88].Our work is inspired by an interesting recent paper [Fal86] which proposes to organize the physical layout of a multiattribute hash file by encoding record signatures using gray code rather than simple binary code. In this way neighboring buckets contain records which differ on a single bit in their signatures. It is then proved that the records which form the answer to a partial match query will tend to be contained in a smaller number of clusters as compared with the binary arrangement. It is also shown that this idea is applicable to many other multiattribute hashing schemes with a small amount of overhead. In addition, it can improve access time to directories of grid type files, extendible hashing and file methods which employ the z-ordering [Ore84].