Fast matching method for DNA sequences

  • Authors:
  • Jin Wook Kim;Eunsang Kim;Kunsoo Park

  • Affiliations:
  • HM Research, Seoul, Korea;School of Computer Science and Engineering, Seoul National University, Seoul, Korea;School of Computer Science and Engineering, Seoul National University, Seoul, Korea

  • Venue:
  • ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

DNA sequences are the fundamental information for each species and a comparison between DNA sequences of different species is an important task. Since DNA sequences are very long and there exist many species, not only fast matching but also efficient storage is an important factor for DNA sequences. Thus, a fast string matching method suitable for encoded DNA sequences is needed. In this paper, we present a fast string matching method for encoded DNA sequences which does not decode DNA sequences while matching. We use four-characters-to-one-byte encoding and combine a suffix approach and a multipattern matching approach. Experimental results show that our method is about 5 times faster than AGREP and the fastest among known algorithms.