On the complexity of designing optimal partial-match retrieval systems

Authors:
Shlomo Moran
Affiliations:
The Technion, Haifa, Israel
Venue:
ACM Transactions on Database Systems (TODS)
Year:
1983

Citing 8
Cited 2

Optimal partial-match retrieval when fields are independently specified

ACM Transactions on Database Systems (TODS)
Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems

Journal of the ACM (JACM)
Exact and Approximate Algorithms for Scheduling Nonidentical Processors

Journal of the ACM (JACM)
`` Strong '' NP-Completeness Results: Motivation, Examples, and Implications

Journal of the ACM (JACM)
Optimality Properties of Multiple-Key Hashing Functions

Journal of the ACM (JACM)
Attribute based file organization in a paged memory environment

Communications of the ACM
Principles of Database Systems

Principles of Database Systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness

Clustered multiattribute hash files

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A compendium of key search references

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of designing an information retrieval system on which partial match queries have to be answered. Each record in the system consists of a list of attributes, and a partial match query specifies the values of some of the attributes. The records are stored in buckets in a secondary memory, and in order to answer a partial match query all the buckets that may contain a record satisfying the specifications of that query must be retrieved. The bucket in which a given record is stored is found by a multiple key hashing function, which maps each attribute to a string of a fixed number of bits. The address of that bucket is then represented by the string obtained by concatenating the strings on which the various attributes were mapped. A partial match query may specify only part of the bits in the string representing the address, and the larger the number of bits specified, the smaller the number of buckets that have to be retrieved in order to answer the query.The optimization problem considered in this paper is that of deciding to how many bits each attribute should be mapped by the bashing function above, so that the expected number of buckets retrieved per query is minimized. Efficient solutions for special cases of this problem have been obtained in [1], [12], and [14]. It is shown that in general the problem is NP-hard, and that if P ≠ NP, it is also not fully approximable. Two heuristic algorithms for the problem are also given and compared.