Partitioned signature files: design issues and performance evaluation

  • Authors:
  • Dik Lun Lee;Chun-Wu Leng

  • Affiliations:
  • Ohio State Univ., Columbus;Ohio State Univ., Columbus

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

A signature file acts as a filtering mechanism to reduce the amount of text that needs to be searched for a query. Unfortunately, the signature file itself must be exhaustively searched, resulting in degraded performance for a large file size. We propose to use a deterministic algorithm to divide a signature file into partitions, each of which contains signatures with the same “key.” The signature keys in a partition can be extracted and represented as the partition's key. The search can then be confined to the subset of partitions whose keys match the query key. Our main concern here is to study methods for obtaining the keys and their performance in terms of their ability to reduce the search space.Owing to the reduction of search space, partitioning a signature file has a direct benefit in a sequential search (single-processor) environment. In a parallel environment, search can be conducted in parallel effectively by allocating one or more partitions to a processor. Partitioning the signature tile with a deterministic method (as opposed to a random partitioning scheme) provides intraquery parallelism as well as interquery parallelism.In this paper, we outline the criteria for evaluating partitioning schemes. Three algorithms are described and studied. An analytical study of the performance of the algorithms is provided and the results are verified with simulation.