Optimal partial-match retrieval when fields are independently specified
ACM Transactions on Database Systems (TODS)
Fast Approximation Algorithms for the Knapsack and Sum of Subset Problems
Journal of the ACM (JACM)
Exact and Approximate Algorithms for Scheduling Nonidentical Processors
Journal of the ACM (JACM)
`` Strong '' NP-Completeness Results: Motivation, Examples, and Implications
Journal of the ACM (JACM)
Optimality Properties of Multiple-Key Hashing Functions
Journal of the ACM (JACM)
Attribute based file organization in a paged memory environment
Communications of the ACM
Principles of Database Systems
Principles of Database Systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Clustered multiattribute hash files
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A compendium of key search references
ACM SIGIR Forum
Hi-index | 0.00 |
We consider the problem of designing an information retrieval system on which partial match queries have to be answered. Each record in the system consists of a list of attributes, and a partial match query specifies the values of some of the attributes. The records are stored in buckets in a secondary memory, and in order to answer a partial match query all the buckets that may contain a record satisfying the specifications of that query must be retrieved. The bucket in which a given record is stored is found by a multiple key hashing function, which maps each attribute to a string of a fixed number of bits. The address of that bucket is then represented by the string obtained by concatenating the strings on which the various attributes were mapped. A partial match query may specify only part of the bits in the string representing the address, and the larger the number of bits specified, the smaller the number of buckets that have to be retrieved in order to answer the query.The optimization problem considered in this paper is that of deciding to how many bits each attribute should be mapped by the bashing function above, so that the expected number of buckets retrieved per query is minimized. Efficient solutions for special cases of this problem have been obtained in [1], [12], and [14]. It is shown that in general the problem is NP-hard, and that if P ≠ NP, it is also not fully approximable. Two heuristic algorithms for the problem are also given and compared.