Efficient Filtration of Sequence Similarity Search Through Singular Value Decomposition

  • Authors:
  • Affiliations:
  • Venue:
  • BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity search in textual databases and bioinformaticshas received substantial attention in the past decade. Numerousfiltration and indexing techniques have been proposedto reduce the curse of dimensionality. This paperproposes a novel approach to map the problem of whole-genomesequence similarity search into an approximatevector comparison in the well-established multidimensionalvector space. We propose the application of the SingularValue Decomposition (SVD) dimensionality reduction techniqueas a pre-processing filtration step to effectively reducethe search space and the running time of the search operation.Our empirical results on a Prokaryote and a EukaryoteDNA contig dataset, demonstrate effective filtration toprune non-relevant portions of the database with up to 2 .3times faster running time compared with q-gram approach.SVD filtration may easily be integrated as a pre-processingstep for any of the well-known sequence search heuristicsas BLAST, QUASAR and FastA. We analyze the precision ofapplying SVD filtration as a transformation-based dimensionalityreduction technique, and finally discuss the imposedtrade-offs.