Bounds on the Complexity of the Longest Common Subsequence Problem

  • Authors:
  • J. D. Ullman;A. V. Aho;D. S. Hirschberg

  • Affiliations:
  • Department of Electrical Engineering, Princeton University, Princeton, NJ;Bell Laboratories, Inc., 600 Mountain Avenue, Murray Hill, NJ;Department of Electrical Engineering, Rice University, Houston, TX and Princeton University, Princeton, New Jersey

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 1976

Quantified Score

Hi-index 0.02

Visualization

Abstract

The problem of finding a longest common subsequence of two strings is discussed. This problem arises in data processing applications such as comparing two files and in genetic applications such as studying molecular evolution. The difficulty of computing a longest common subsequence of two strings is examined using the decision tree model of computation, in which vertices represent “equal - unequal” comparisons. It is shown that unless a bound on the total number of distinct symbols is assumed, every solution to the problem can consume an amount of time that is proportional to the product of the lengths of the two strings. A general lower bound as a function of the ratio of alphabet size to string length is derived. The case where comparisons between symbols of the same string are forbidden is also considered and it is shown that this problem is of linear complexity for a two-symbol alphabet and quadratic for an alphabet of three or more symbols.