Efficient Filtration of Sequence Similarity Search Through Singular Value Decomposition

Authors:
Affiliations:
Venue:
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Year:
2004

Citing 0
Cited 2

Survey on index based homology search algorithms

The Journal of Supercomputing
Filtering bio-sequence based on sequence descriptor

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity search in textual databases and bioinformaticshas received substantial attention in the past decade. Numerousfiltration and indexing techniques have been proposedto reduce the curse of dimensionality. This paperproposes a novel approach to map the problem of whole-genomesequence similarity search into an approximatevector comparison in the well-established multidimensionalvector space. We propose the application of the SingularValue Decomposition (SVD) dimensionality reduction techniqueas a pre-processing filtration step to effectively reducethe search space and the running time of the search operation.Our empirical results on a Prokaryote and a EukaryoteDNA contig dataset, demonstrate effective filtration toprune non-relevant portions of the database with up to 2 .3times faster running time compared with q-gram approach.SVD filtration may easily be integrated as a pre-processingstep for any of the well-known sequence search heuristicsas BLAST, QUASAR and FastA. We analyze the precision ofapplying SVD filtration as a transformation-based dimensionalityreduction technique, and finally discuss the imposedtrade-offs.