A minimum cost process in searching for a set of similar DNA sequences

  • Authors:
  • M. Yazid M. Saman;M. Nordin A. Rahman;Aziz Ahmad;A. Osman M. Tap

  • Affiliations:
  • Computer Science Department, Kolej Universiti Sains dan Teknologi Malaysia, Terengganu, Malaysia;Information Technology Center, University of Darul Iman, Terengganu, Malaysia;Biology Science Department, Kolej Universiti Sains dan Teknologi Malaysia, Terengganu, Malaysia;Mathematics Department, Kolej Universiti Sains dan Teknologi Malaysia, Terengganu, Malaysia

  • Venue:
  • TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA sequence alignment for similarity search is a vital topic in bioinformatics algorithm development. Computational searching for a set of DNA sequences, S, that similar to a query sequence, q, in a large scale of DNA databases is very complicated and requires high processors performance as well as large memory spaces. Frequently, quadratic running time complexity dynamic programming algorithms used to produce a local optimal sequence alignment. However, this algorithm is time consuming in dealing with a long DNA sequences. By means of local alignment, this paper presents a framework to search a set of similar sequences in a large scale of DNA databases with reliable output and minimum cost. The Knuth-Morris-Pratt algorithm (KMP) is adapted and acts as a filtering mechanism before exhaustive dynamic programming is applied. The KMP algorithm is used to scan the generated patterns from query sequence to the sequences in databases. This filtering process generates scores which are used for ranking purposes. The Smith-Waterman algorithm then is applied to each sequences starting from the top of the constructed ranking. The paper also discusses the optimal patterns length that highly appropriate for the database scanning process. The experiment results show that the filtering mechanism proposes discard irrelevant sequences. Therefore, the time for searching and retrieving the set of similar sequences from databases to the query is minimized.