On a partitioning problem

  • Authors:
  • C. T. Yu;M. K. Siu;K. Lam

  • Affiliations:
  • Univ. of Alberta, Edmonton, Alta., Canada;Univ. of Alberta, Edmonton, Alta., Canada;Univ. of Alberta, Edmonton, Alta., Canada

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 1978

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the problem of locating a set of “boundary points” of a large number of records. Conceptually, the boundary points partition the records into subsets of roughly the same number of elements, such that the key values of the records in one subset are all smaller or all larger than those of the records in another subset. We guess the locations of the boundary points by linear interpolation and check their accuracy by reading the key values of the records on one pass. This process is repeated until all boundary points are determined. Clearly, this problem can also be solved by performing an external tape sort. Both analytical and empirical results indicate that the number of passes required is small in comparison with that in an external tape sort. This kind of record partitioning may be of interest in setting up a statistical database system.