A learning algorithm for the longest common subsequence problem

Authors:
Eric A. Breimer;Mark K. Goldberg;Darren T. Lim
Affiliations:
Rensselaer Polytechnic Institute;Rensselaer Polytechnic Institute;Rensselaer Polytechnic Institute
Venue:
Journal of Experimental Algorithmics (JEA)
Year:
2003

Citing 12
Cited 1

A theory of the learnable

Communications of the ACM
Algorithms for approximate string matching

Information and Control
An O(NP) sequence comparison algorithm

Information Processing Letters
On the Approximation of Shortest Common Supersequencesand Longest Common Subsequences

SIAM Journal on Computing
Path optimization and near-greedy analysis for graph partitioning: an empirical study

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
The String-to-String Correction Problem

Journal of the ACM (JACM)
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
A fast algorithm for computing longest common subsequences

Communications of the ACM
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Machine Learning

Machine Learning
Upper Bounds for the Expected Length of a Longest Common Subsequence of Two Binary Sequences

STACS '94 Proceedings of the 11th Annual Symposium on Theoretical Aspects of Computer Science
Selected combinatorial research problems.

Selected combinatorial research problems.

Finding the longest common subsequence for multiple biological sequences by ant colony optimization

Computers and Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an experimental study of a learning algorithm for the longest common subsequence problem, LCS. Given an arbitrary input domain, the algorithm learns an LCS-procedure tailored to that domain. The learning is done with the help of an oracle, which can be any LCS-algorithm. After solving a limited number of training inputs using an oracle, the learning algorithm outputs a new LCS-procedure.Our experiments demonstrate that, by allowing a slight loss of optimality, learning yields a procedure which is significantly faster than the oracle. The oracle used for the experiments is the np-procedure by Wu et al., a modification of Myers' classical LCS-algorithm. We show how to scale up the results of learning on small inputs to inputs of arbitrary lengths. For the domain of two random 2-symbol inputs of length n, learning yields a program with 0.999 expected accuracy, which runs in O(n1.41)-time, in contrast with O(n2/log n) running time of the fastest theoretical algorithm that produces optimal solutions. For the domain of random 2-symbol inputs of length 100,000, the program runs 10.5 times faster than the np-procedure, producing 0.999- accurate outputs. The scaled version of the evolved algorithm applied to random inputs of length 1 million runs approximately 30 times faster than the np-procedure while constructing 0.999- accurate solutions. We apply the evolved algorithm to DNA sequences of various lengths by training on random 4-symbol sequences of up to length 10,000. The evolved algorithm, scaled up to the lengths of up to 1.8 million, produces solutions with the 0.998-accuracy in a fraction of the time used by the np.