Distribution-dependent hashing functions and their characteristics

  • Authors:
  • R. F. Deutscher;P. G. Sorenson;J. P. Tremblay

  • Affiliations:
  • University of Saskatchewan, Saskatoon, Saskatchewan Canada;University of Saskatchewan, Saskatoon, Saskatchewan Canada;University of Saskatchewan, Saskatoon, Saskatchewan Canada

  • Venue:
  • SIGMOD '75 Proceedings of the 1975 ACM SIGMOD international conference on Management of data
  • Year:
  • 1975

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper procedures are studied for storing, accessing, updating, and reorganizing data in large files whose organization is direct, an organization used when a fast response time is required. "Distribution-dependent" hashing functions and the division method are compared as methods of indirect addressing."Distribution-dependent" hashing functions are characterized. These hashing functions generate addresses from a set of keys by using knowledge of the distribution of that key set within the key space or range of keys. A study of the performance measures obtained during tests of these functions on several key sets indicates that in certain cases, distribution-dependent methods perform better than the division method. This result is extended by a demonstration that distribution-dependent hashing functions can accommodate a change in the distribution of keys without being redefined. A number of insertions to and deletions from the key set can be made before a distribution-dependent hashing function gives poorer performance than the division method under identical circumstances.If many additions are made to a set of keys, it becomes necessary to reorganize, in a larger storage area, the direct file of records identified by that key set. Although processor time must be sacrificed in order to redefine a distribution-dependent hashing function, the division method requires substantially greater access time in a reorganizational situation.