A fast bit-parallel multi-patterns string matching algorithm for biological sequences

  • Authors:
  • Rajesh Prasad;Suneeta Agarwal;Ishadutta Yadav;Bharat Singh

  • Affiliations:
  • Motilal Nehru National Institute of Technology, Allahabad, India;Motilal Nehru National Institute of Technology, Allahabad, India;Motilal Nehru National Institute of Technology, Allahabad, India;Motilal Nehru National Institute of Technology, Allahabad, India

  • Venue:
  • ISB '10 Proceedings of the International Symposium on Biocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of searching occurrences of a pattern P[0...m-1] in the text T[0...n-1with m ≤ n, where the symbols of P and T are drawn from some alphabet Σ of size σ, is called exact string matching problem. In the present day, pattern matching is a powerful tool in locating nucleotide or amino acid sequence patterns in the biological sequence database. The problem of searching a set of patterns P0, P1, P2...Pr-1, r ≥ 1, in the given text T is called multi-pattern string matching problem. The multi-patterns string matching problem has been previously solved by efficient bit-parallel strings matching algorithms: shift-or and BNDM. Many other types of algorithms also exist for the same purpose, but bit-parallelism has been shown to be very efficient than the others. In this paper, we extend BNDM algorithm with q-gram (B. Durian et al., 2008) for multiple patterns, where each multi-patterns are any DNA patterns. We assume that each pattern is of equal size m and total length of pattern is less than or equal to word length (w) of computer used. Since BNDM algorithm has been shown to be faster than any other bit-parallel string matching algorithm (G. Navarro, 2000), therefore, we compare the performance of multi-patterns q-gram BNDM algorithm with existing BNDM algorithm for different value of q and number of patterns (r).