Indexing Genomic Databases for Fast Homology Searching

Authors:
Twee-Hee Ong;Kian-Lee Tan;Hao Wang
Affiliations:
-;-;-
Venue:
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Year:
2002

Citing 2
Cited 1

Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
FLASH: A Fast Look-Up Algorithm for String Homology

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology

Survey on index based homology search algorithms

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Genomic sequence databases has been widely used by molecular biologists for homology searching. However, as amino acid and nucleotide databases are growing in size at an alarming rate, traditional brute force approach of comparing a query sequence against each of the database sequences is becoming prohibitively expensive. In this paper, we re-examine the problem of searching for homology in large protein databases. We proposed a novel filter-and-refine approach to speed up the search process. The scheme operates in two phases. In the filtering phase, a small set of candidate database sequences (as compared to all sequences in the database) is quickly identified. This is realized using a signature-based scheme. In the refinement phase, the query sequence is matched against the sequences in the candidate set using any local alignment strategies. Our preliminary experimental results show that the proposed method results in significant savings in computation without sacrificing on the accuracy of the answers as compared to FASTA.