Survey on index based homology search algorithms
The Journal of Supercomputing
Filtering bio-sequence based on sequence descriptor
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Hi-index | 0.00 |
Similarity search in textual databases and bioinformaticshas received substantial attention in the past decade. Numerousfiltration and indexing techniques have been proposedto reduce the curse of dimensionality. This paperproposes a novel approach to map the problem of whole-genomesequence similarity search into an approximatevector comparison in the well-established multidimensionalvector space. We propose the application of the SingularValue Decomposition (SVD) dimensionality reduction techniqueas a pre-processing filtration step to effectively reducethe search space and the running time of the search operation.Our empirical results on a Prokaryote and a EukaryoteDNA contig dataset, demonstrate effective filtration toprune non-relevant portions of the database with up to 2 .3times faster running time compared with q-gram approach.SVD filtration may easily be integrated as a pre-processingstep for any of the well-known sequence search heuristicsas BLAST, QUASAR and FastA. We analyze the precision ofapplying SVD filtration as a transformation-based dimensionalityreduction technique, and finally discuss the imposedtrade-offs.