A Novel Algorithm for Counting All Common Subsequences

Authors:
Hui Wang;Zhiwei Lin
Affiliations:
-;-
Venue:
GRC '07 Proceedings of the 2007 IEEE International Conference on Granular Computing
Year:
2007

Citing 0
Cited 3

XMin: Minimizing Tree Pattern Queries with Minimality Guarantee

World Wide Web
Concordance and consensus

Information Sciences: an International Journal
Kernels for acyclic digraphs

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

A similarity measure is key to any distance-based data analysis task, e.g., clustering, classification and case-based reasoning. The choice of similarity measure depends on the type of data used in a data analysis task. For sequence data the well known similarity measures include sequence edit distance and longest common subsequence. A new measure of sequence similarity, all common subsequence (ACS) [5], is recently proposed. ACS uses the number of common sub- sequences between a pair of sequences as a measure of sim- ilarity. The more common subsequences there are between them, the more similar they are to each other. A dynamic programming algorithm is presented in [5] to compute ACS, which we call calACS-DP. In this paper we analyze the complexities of this algorithm in time and space, which are both quadratic in the length of sequence. Based on this analysis we present a novel, more efficient algorithm to compute ACS, which we call calACS. This al- gorithm is derived from an algorithm presented in [4]. The space complexity of this algorithm is O(min{|A|, |B|}+1) and the time complexity of this algorithm is O(|A| 脳 |B|) for any two sequences A and B. We further illustrate this algorithm with an example.