A practical method for approximate subsequence search in DNA databases

  • Authors:
  • Jung-Im Won;Sang-Kyoon Hong;Jee-Hee Yoon;Sanghyun Park;Sang-Wook Kim

  • Affiliations:
  • College of Information and Communications, Hanyang University, Korea;Division of Information Engineering and Telecommunications, Hallym University, Korea;Division of Information Engineering and Telecommunications, Hallym University, Korea;Department of Computer Science, Yonsei University, Korea;College of Information and Communications, Hanyang University, Korea

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results.