Improved bounds on the average length of longest common subsequences

  • Authors:
  • George S. Lueker

  • Affiliations:
  • University of California, Irvine, California

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

It has long been known [Chvátal and Sankoff 1975] that the average length of the longest common subsequence of two random strings of length n over an alphabet of size k is asymptotic to γkn for some constant γk depending on k. The value of these constants remains unknown, and a number of papers have proved upper and lower bounds on them. We discuss techniques, involving numerical calculations with recurrences on many variables, for determining lower and upper bounds on these constants. To our knowledge, the previous best-known lower and upper bounds for γ2 were those of Dančík and Paterson, approximately 0.773911 and 0.837623 [Dančík 1994; Dančík and Paterson 1995]. We improve these to 0.788071 and 0.826280. This upper bound is less than the γ2 given by Steele's old conjecture (see Steele [1997, page 3]) that γ2 = 2/(1 + &sqrt;2)≈ 0.828427. (As Steele points out, experimental evidence had already suggested that this conjectured value was too high.) Finally, we show that the upper bound technique described here could be used to produce, for any k, a sequence of upper bounds converging to γk, though the computation time grows very quickly as better bounds are guaranteed.