A Novel Algorithm for Counting All Common Subsequences

  • Authors:
  • Hui Wang;Zhiwei Lin

  • Affiliations:
  • -;-

  • Venue:
  • GRC '07 Proceedings of the 2007 IEEE International Conference on Granular Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A similarity measure is key to any distance-based data analysis task, e.g., clustering, classification and case-based reasoning. The choice of similarity measure depends on the type of data used in a data analysis task. For sequence data the well known similarity measures include sequence edit distance and longest common subsequence. A new measure of sequence similarity, all common subsequence (ACS) [5], is recently proposed. ACS uses the number of common sub- sequences between a pair of sequences as a measure of sim- ilarity. The more common subsequences there are between them, the more similar they are to each other. A dynamic programming algorithm is presented in [5] to compute ACS, which we call calACS-DP. In this paper we analyze the complexities of this algorithm in time and space, which are both quadratic in the length of sequence. Based on this analysis we present a novel, more efficient algorithm to compute ACS, which we call calACS. This al- gorithm is derived from an algorithm presented in [4]. The space complexity of this algorithm is O(min{|A|, |B|}+1) and the time complexity of this algorithm is O(|A| 脳 |B|) for any two sequences A and B. We further illustrate this algorithm with an example.