On the complexity of designing optimal partial-match retrieval systems

  • Authors:
  • Shlomo Moran

  • Affiliations:
  • The Technion, Haifa, Israel

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 1983

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of designing an information retrieval system on which partial match queries have to be answered. Each record in the system consists of a list of attributes, and a partial match query specifies the values of some of the attributes. The records are stored in buckets in a secondary memory, and in order to answer a partial match query all the buckets that may contain a record satisfying the specifications of that query must be retrieved. The bucket in which a given record is stored is found by a multiple key hashing function, which maps each attribute to a string of a fixed number of bits. The address of that bucket is then represented by the string obtained by concatenating the strings on which the various attributes were mapped. A partial match query may specify only part of the bits in the string representing the address, and the larger the number of bits specified, the smaller the number of buckets that have to be retrieved in order to answer the query.The optimization problem considered in this paper is that of deciding to how many bits each attribute should be mapped by the bashing function above, so that the expected number of buckets retrieved per query is minimized. Efficient solutions for special cases of this problem have been obtained in [1], [12], and [14]. It is shown that in general the problem is NP-hard, and that if P ≠ NP, it is also not fully approximable. Two heuristic algorithms for the problem are also given and compared.