Efficient top-k retrieval with signatures

  • Authors:
  • Timothy Chappell;Shlomo Geva;Anthony Nguyen;Guido Zuccon

  • Affiliations:
  • Queensland University of Technology, Brisbane, Australia;Queensland University of Technology, Brisbane, Australia;Australian e-Health Research Centre, CSIRO, Brisbane, Australia;Australian e-Health Research Centre, CSIRO, Brisbane, Australia

  • Venue:
  • Proceedings of the 18th Australasian Document Computing Symposium
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.