A learning algorithm for the longest common subsequence problem

  • Authors:
  • Eric A. Breimer;Mark K. Goldberg;Darren T. Lim

  • Affiliations:
  • Rensselaer Polytechnic Institute;Rensselaer Polytechnic Institute;Rensselaer Polytechnic Institute

  • Venue:
  • Journal of Experimental Algorithmics (JEA)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an experimental study of a learning algorithm for the longest common subsequence problem, LCS. Given an arbitrary input domain, the algorithm learns an LCS-procedure tailored to that domain. The learning is done with the help of an oracle, which can be any LCS-algorithm. After solving a limited number of training inputs using an oracle, the learning algorithm outputs a new LCS-procedure.Our experiments demonstrate that, by allowing a slight loss of optimality, learning yields a procedure which is significantly faster than the oracle. The oracle used for the experiments is the np-procedure by Wu et al., a modification of Myers' classical LCS-algorithm. We show how to scale up the results of learning on small inputs to inputs of arbitrary lengths. For the domain of two random 2-symbol inputs of length n, learning yields a program with 0.999 expected accuracy, which runs in O(n1.41)-time, in contrast with O(n2/log n) running time of the fastest theoretical algorithm that produces optimal solutions. For the domain of random 2-symbol inputs of length 100,000, the program runs 10.5 times faster than the np-procedure, producing 0.999- accurate outputs. The scaled version of the evolved algorithm applied to random inputs of length 1 million runs approximately 30 times faster than the np-procedure while constructing 0.999- accurate solutions. We apply the evolved algorithm to DNA sequences of various lengths by training on random 4-symbol sequences of up to length 10,000. The evolved algorithm, scaled up to the lengths of up to 1.8 million, produces solutions with the 0.998-accuracy in a fraction of the time used by the np.