A minimum cost process in searching for a set of similar DNA sequences

Authors:
M. Yazid M. Saman;M. Nordin A. Rahman;Aziz Ahmad;A. Osman M. Tap
Affiliations:
Computer Science Department, Kolej Universiti Sains dan Teknologi Malaysia, Terengganu, Malaysia;Information Technology Center, University of Darul Iman, Terengganu, Malaysia;Biology Science Department, Kolej Universiti Sains dan Teknologi Malaysia, Terengganu, Malaysia;Mathematics Department, Kolej Universiti Sains dan Teknologi Malaysia, Terengganu, Malaysia
Venue:
TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
Year:
2006

Citing 7
Cited 1

Experimental results on string matching algorithms

Software—Practice & Experience
K-M-P string matching revisited

Information Processing Letters
Estimating Seed Sensitivity on Homogeneous Alignments

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Computer science and bioinformatics

Communications of the ACM - The disappearing computer
Good spaced seeds for homology search

Bioinformatics
Biological sequence alignment on the computational grid using the GrADS framework

Future Generation Computer Systems - Special section: Complex problem-solving environments for grid computing
An adaptive grid implementation of DNA sequence alignment

Future Generation Computer Systems

Mining sequential patterns by PrefixSpan algorithm with approximation

ACS'08 Proceedings of the 8th conference on Applied computer scince

Quantified Score

Hi-index	0.00

Visualization

Abstract

DNA sequence alignment for similarity search is a vital topic in bioinformatics algorithm development. Computational searching for a set of DNA sequences, S, that similar to a query sequence, q, in a large scale of DNA databases is very complicated and requires high processors performance as well as large memory spaces. Frequently, quadratic running time complexity dynamic programming algorithms used to produce a local optimal sequence alignment. However, this algorithm is time consuming in dealing with a long DNA sequences. By means of local alignment, this paper presents a framework to search a set of similar sequences in a large scale of DNA databases with reliable output and minimum cost. The Knuth-Morris-Pratt algorithm (KMP) is adapted and acts as a filtering mechanism before exhaustive dynamic programming is applied. The KMP algorithm is used to scan the generated patterns from query sequence to the sequences in databases. This filtering process generates scores which are used for ranking purposes. The Smith-Waterman algorithm then is applied to each sequences starting from the top of the constructed ranking. The paper also discusses the optimal patterns length that highly appropriate for the database scanning process. The experiment results show that the filtering mechanism proposes discard irrelevant sequences. Therefore, the time for searching and retrieving the set of similar sequences from databases to the query is minimized.