Analysis of multiterm queries in a dynamic signature file organization

  • Authors:
  • Deniz Aktug;Fazli Can

  • Affiliations:
  • -;-

  • Venue:
  • SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1993

Quantified Score

Hi-index 0.01

Visualization

Abstract

Our analysis combines the concerns of signature extraction and signature file organization which have usually been treated as separate issues. We also relax the uniform frequency and single term query assumptions and provide a comprehensive analysis for multiterm query environments where terms can be classified based on their query and database occurrence frequencies. The performance of three superimposed signature generation schemes is explored as they are applied to one dynamic signature file organization based on linear hashing: Linear Hashing with Superimposed Signatures (LHSS). First scheme (SM) allows all terms set the same number of bits regardless of their discriminatory power whereas the second and third methods (MMS and MMM) emphasize the terms with high query and low database occurrence frequencies. Of these three schemes, only MMM takes the probability distribution of the number of query terms into account in finding the optimal mapping strategy. Derivation of performance evaluation formulas is provided together with the results of various experimental settings. Suggestions as to how to implement the given techniques in real life cases are also provided. Results indicate that MMM outperforms the other methods as the gap between the discriminatory power of the terms gets larger. The absolute value of the savings provided by MMM reach a maximum for the high query weight case. However, the extra savings decline sharply for high weight and moderately for the low weight queries with the increase in database size.