Distribution-dependent hashing functions and their characteristics

Authors:
R. F. Deutscher;P. G. Sorenson;J. P. Tremblay
Affiliations:
University of Saskatchewan, Saskatoon, Saskatchewan Canada;University of Saskatchewan, Saskatoon, Saskatchewan Canada;University of Saskatchewan, Saskatoon, Saskatchewan Canada
Venue:
SIGMOD '75 Proceedings of the 1975 ACM SIGMOD international conference on Management of data
Year:
1975

Citing 5
Cited 6

The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Sorting by Address Calculation

Journal of the ACM (JACM)
Key-to-address transform techniques: a fundamental performance study on large existing formatted files

Communications of the ACM
Analysis of computational systems: Cumulative polygon address calculation sorting

ACM '65 Proceedings of the 1965 20th national conference
Estimation of the cumulative by fourier series methods and application to the insertion problem

ACM '68 Proceedings of the 1968 23rd ACM national conference

Order-preserving key transformations

ACM Transactions on Database Systems (TODS)
Hashing practice: analysis of hashing and universal hashing

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Managing Statistical Behavior of Large Data Sets in Shared-Nothing Architectures

IEEE Transactions on Parallel and Distributed Systems
Trie hashing

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
New Order Preserving Access Methods for Very Large Files Derived from Linear Hashing

IEEE Transactions on Knowledge and Data Engineering
B-tree indexes, interpolation search, and skew

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper procedures are studied for storing, accessing, updating, and reorganizing data in large files whose organization is direct, an organization used when a fast response time is required. "Distribution-dependent" hashing functions and the division method are compared as methods of indirect addressing."Distribution-dependent" hashing functions are characterized. These hashing functions generate addresses from a set of keys by using knowledge of the distribution of that key set within the key space or range of keys. A study of the performance measures obtained during tests of these functions on several key sets indicates that in certain cases, distribution-dependent methods perform better than the division method. This result is extended by a demonstration that distribution-dependent hashing functions can accommodate a change in the distribution of keys without being redefined. A number of insertions to and deletions from the key set can be made before a distribution-dependent hashing function gives poorer performance than the division method under identical circumstances.If many additions are made to a set of keys, it becomes necessary to reorganize, in a larger storage area, the direct file of records identified by that key set. Although processor time must be sacrificed in order to redefine a distribution-dependent hashing function, the division method requires substantially greater access time in a reorganizational situation.