A novel filtration method in biological sequence databases
Pattern Recognition Letters
Survey on index based homology search algorithms
The Journal of Supercomputing
Filtering bio-sequence based on sequence descriptor
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Hi-index | 0.00 |
The problem of proximity search in biological databases is addressed. We study vector transformations and conduct the application of DFT(Discrete Fourier Transformation) and DWT(Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNA sequence proximity search to reduce the search time of range queries. Our empirical results on a number of Prokaryote and Eukaryote DNA contig databases demonstrate up to 50-fold filtration ratio of the search space, up to 13 times faster filtration. The proposed transformation techniques may easily be integrated as a preprocessing phase on top of the current existing similarity search heuristics such as BLAST [1], PattenHunter [11], FastA [17], QUASAR [4] and to efficiently prune non-relevant sequences. We study the precision of applying dimensionality reduction techniques for faster and more efficient range query searches, and discuss the imposed trade-offs.