Data structures for accelerating Tanimoto queries on real valued vectors

  • Authors:
  • Thomas G. Kristensen;Christian N. S. Pedersen

  • Affiliations:
  • Bioinformatics Research Center, Aarhus University, Aarhus, Denmark;Bioinformatics Research Center, Aarhus University, Aarhus, Denmark

  • Venue:
  • WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous methods for accelerating Tanimoto queries have been based on using bit strings for representing molecules. No work has gone into examining accelerating Tanimoto queries on real valued descriptors, even though these offer a much more fine grained measure of similarity between molecules. This study utilises a recently discovered reduction from Tanimoto queries to distance queries in Euclidean space to accelerate Tanimoto queries using standard metric data structures. The presented experiments show that it is possible to gain a significant speedup and that general metric data structures are better suited than a data structure tailored for Euclidean space on vectors generated from molecular data.