A very fast substring search algorithm
Communications of the ACM
A new approach to text searching
Communications of the ACM
Tight Bounds on the Complexity of the Boyer--Moore String Matching Algorithm
SIAM Journal on Computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
A fast string searching algorithm
Communications of the ACM
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Hi-index | 0.00 |
In this study we have designed a novel algorithm for searching common segments in multiple DNA sequences. To improve efficiency in pattern searching, combination of hashing encoding, quick sorting and ladderlike stepping and/or interval jumping techniques are applied. Since multiple sequence alignment of DNA sequences from the giant genomic database is usually time consuming, we develop a three-phase methodology to search common sub-segments and reduce its time complexity for pattern matching. In the first coding phase, DNA nucleotide sequences are transformed into a numerical space set. Subsequently, the quick sort algorithms are employed in the second sorting stage to reorder the encoded data. In the last searching phase, ladderlike stepping and interval jumping rules are proposed to increase efficiencies of numerical comparison. In addition, two interval segmentation techniques, uniform partition and bitwise partition are applied prior to interval jumping procedures. The segmenting methodologies are designed according to the length of searching pattern, and the proposed ladderlike searching algorithms provide robust and improved performance. Experimental results show that the algorithms are capable of reducing time complexity from O(mLi(Li - m +1)+mLj(Lj-m+1)) to O(|Ii|+|Ij|).