Research article: Deposition and extension approach to find longest common subsequence for thousands of long sequences

  • Authors:
  • Kang Ning

  • Affiliations:
  • Department of Pathology, University of Michigan, 4237 Medical Science I, Ann Arbor, MI 48109, USA

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of finding the longest common subsequence (LCS) for an arbitrary number of sequences is a very interesting and challenging problem in computer science. This problem is NP-complete, but because of its importance, many heuristic algorithms have been proposed, such as Long Run, Expansion Algorithm and THSB. However, the performance, either in result quality or in process time, of many current heuristic algorithms deteriorates fast when the number of sequences and sequence length increase. In this paper, we have proposed a post-process heuristic algorithm for the LCS problem, the Deposition and Extension Algorithm (DEA). This algorithm first generates common subsequence by ''sequence deposition'' based on fine tuning of search range, and then extends this common subsequence. The algorithm is proven to generate Common Subsequences (CSs) with guaranteed lengths. The experiments on different dataset showed that the results of DEA algorithm were better than those of Long Run and Expansion Algorithm, especially on many long sequences. The algorithm also had superior efficiency both in time and memory space.