Random sampling from hash files

  • Authors:
  • Frank Olken;Doron Rotem;Ping Xu

  • Affiliations:
  • Computer Science Research & Development Dept., Information and Computing Sciences DIV., Lawrence Berkeley Laboratory, 1 Cyclotron Road, Berkeley, CA;Computer Science Research & Development Dept., Information and Computing Sciences DIV., Lawrence Berkeley Laboratory, 1 Cyclotron Road, Berkeley, CA;Computer Science Dept., San Francisco State University, San Francisco, CA

  • Venue:
  • SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and hash files with separate overflow chains. The dynamic hashing methods considered are Linear Hash files [Lit80] and Extendible Hash files [FNPS79]. We give the cost of sampling in terms of the cost of successfully searching a hash file and show how to exploit features of the dynamic hashing methods to improve sampling efficiency.