Turbo similarity searching: Effect of fingerprint and dataset on virtual-screening performance

  • Authors:
  • Eleanor J. Gardiner;Valerie J. Gillet;Maciej Haranczyk;Jérôme Hert;John D. Holliday;Nurul Malim;Yogendra Patel;Peter Willett

  • Affiliations:
  • Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK;Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK

  • Venue:
  • Statistical Analysis and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Turbo similarity searching uses information about the nearest neighbors in a conventional chemical similarity search to increase the effectiveness of virtual screening with a data fusion approach being used to combine the nearest-neighbor information. A previous paper suggested that the approach was highly effective in operation; this paper further tests the approach using a range of different databases and of structural representations. Searches were carried out on three different databases of chemical structures, using seven different types of fingerprints, as well as molecular holograms, physicochemical properties, topological indices and reduced graphs. The results show that turbo similarity searching can indeed enhance retrieval but that this is normally achieved only if the similarity search that acts as its starting point has already achieved at least some reasonable level of search effectiveness. In other cases, a modified version of TSS that uses the nearest-neighbor information for approximate machine learning can be used effectively. Though useful for qualitative (active-inactive) predictions of biological activity, turbo similarity searching does not appear to exhibit any predictive power when quantitative property data is available. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 103-114, 2009