Fast matching method for DNA sequences

Authors:
Jin Wook Kim;Eunsang Kim;Kunsoo Park
Affiliations:
HM Research, Seoul, Korea;School of Computer Science and Engineering, Seoul National University, Seoul, Korea;School of Computer Science and Engineering, Seoul National University, Seoul, Korea
Venue:
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Year:
2007

Citing 19
Cited 1

A very fast substring search algorithm

Communications of the ACM
A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
A text compression scheme that allows fast searching directly in the compressed file

ACM Transactions on Information Systems (TOIS)
String matching in the DNA alphabet

Software—Practice & Experience
Let sleeping files lie: pattern matching in Z-compressed files

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
A fast string searching algorithm

Communications of the ACM
Fast and flexible string matching by combining bit-parallelism and suffix automata

Journal of Experimental Algorithmics (JEA)
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences

Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
A String Matching Algorithm Fast on the Average

Proceedings of the 6th Colloquium, on Automata, Languages and Programming
A Boyer-Moore Type Algorithm for Compressed Pattern Matching

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Efficient Experimental String Matching by Weak Factor Recognition

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Very Fast String Matching Algorithm for Small Alphabeths and Long Patterns (Extended Abstract)

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
Speeding Up Pattern Matching by Text Compression

CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Shift-or string matching with super-alphabets

Information Processing Letters
Compressed Pattern Matching in DNA Sequences

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles

Software—Practice & Experience
A simple fast hybrid pattern-matching algorithm

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Practical and optimal string matching

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

An Efficient Matching Algorithm for Encoded DNA Sequences and Binary Strings

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

DNA sequences are the fundamental information for each species and a comparison between DNA sequences of different species is an important task. Since DNA sequences are very long and there exist many species, not only fast matching but also efficient storage is an important factor for DNA sequences. Thus, a fast string matching method suitable for encoded DNA sequences is needed. In this paper, we present a fast string matching method for encoded DNA sequences which does not decode DNA sequences while matching. We use four-characters-to-one-byte encoding and combine a suffix approach and a multipattern matching approach. Experimental results show that our method is about 5 times faster than AGREP and the fastest among known algorithms.