A fast algorithm for computing longest common subsequences

Authors:
James W. Hunt;Thomas G. Szymanski
Affiliations:
Stanford Univ., Stanford, CA;Princeton Univ., Princeton, NJ
Venue:
Communications of the ACM
Year:
1977

Citing 3
Cited 135

The String-to-String Correction Problem

Journal of the ACM (JACM)
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
Selected combinatorial research problems.

Selected combinatorial research problems.

The promotion and accumulation strategies in transformational programming

ACM Transactions on Programming Languages and Systems (TOPLAS) - Lecture notes in computer science Vol. 174
Editing by example

ACM Transactions on Programming Languages and Systems (TOPLAS)
New algorithms for comparing symbol sequences

ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
An improved algorithm to find the length of the longest common subsequence of two strings

ACM SIGIR Forum
Structure-oriented merging of revisions of software documents

SCM '91 Proceedings of the 3rd international workshop on Software configuration management
Delta storage for arbitrary non-text files

SCM '91 Proceedings of the 3rd international workshop on Software configuration management
Data structures and algorithms for disjoint set union problems

ACM Computing Surveys (CSUR)
Edit distance of run-length coded strings

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Sparse dynamic programming I: linear cost functions

Journal of the ACM (JACM)
A theory of parameterized pattern matching: algorithms and applications

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Numerical Similarity and Dissimilarity Measures Between Two Trees

IEEE Transactions on Computers
Delta algorithms: an empirical analysis

ACM Transactions on Software Engineering and Methodology (TOSEM)
Sim: a utility for detecting similarity in computer programs

SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Parameterized diff

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Sparse dynamic programming

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
Fast evaluation of sequence pair in block placement by longest common subsequence computation

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Approximate String Matching

ACM Computing Surveys (CSUR)
The string-to-string correction problem with block moves

ACM Transactions on Computer Systems (TOCS)
Parallel dynamic programming for solving the string editing problem on a CGM/BSP

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
A State-of-the-Art Survey on Software Merging

IEEE Transactions on Software Engineering
An Approach to Designing Very Fast Approximate String Matching Algorithms

IEEE Transactions on Knowledge and Data Engineering
A new flexible algorithm for the longest common subsequence problem

Nordic Journal of Computing
Approximation algorithms for the shortest common supersequence

Nordic Journal of Computing
Efficient Computation of All Longest Common Subsequences

SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
Efficient Snapshot Differential Algorithms for Data Warehousing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism

ECOOP '95 Proceedings of the 9th European Conference on Object-Oriented Programming
Algorithms for Transposition Invariant String Matching

STACS '03 Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science
Music Structure Analysis and Its Application to Theme Phrase Extraction

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
Cluster: A Fast Tool to Identify Groups of Similar Programs

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Longest Common Subsequence from Fragments via Sparse Dynamic Programming

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Editing by example

POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
A redisplay algorithm

Proceedings of the ACM SIGPLAN SIGOA symposium on Text manipulation
Speeding-up Hirschberg and Hunt-Szymanski LCS algorithms

Fundamenta Informaticae - Special issue on computing patterns in strings
Sparse LCS common substring alignment

Information Processing Letters
Bitext maps and alignment via pattern recognition

Computational Linguistics
A learning algorithm for the longest common subsequence problem

Journal of Experimental Algorithmics (JEA)
Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism

Information Processing Letters
Practical language-independent detection of near-miss clones

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Longest increasing subsequences in sliding windows

Theoretical Computer Science
An almost-linear time and linear space algorithm for the longest common subsequence problem

Information Processing Letters
Transposition invariant string matching

Journal of Algorithms
Adaptive OCR with Limited User Feedback

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
An XML based approach to support the evolution of model-to-model traceability links

TEFSE '05 Proceedings of the 3rd international workshop on Traceability in emerging forms of software engineering
Program element matching for multi-version program analyses

Proceedings of the 2006 international workshop on Mining software repositories
Speeding up transposition-invariant string matching

Information Processing Letters
Multi-column substring matching for database schema translation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Representing concerns in source code

ACM Transactions on Software Engineering and Methodology (TOSEM)
Edit distance-based kernel functions for structural pattern classification

Pattern Recognition
An efficient and accurate method for evaluating time series similarity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Automatic Inference of Structural Changes for Matching across Program Versions

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Refactoring-Aware Configuration Management for Object-Oriented Programs

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Assessing word processing skills by computer

Information Services and Use
Two algorithms for LCS Consecutive Suffix Alignment

Journal of Computer and System Sciences
The communication and streaming complexity of computing the longest common and increasing subsequences

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction

IEEE Transactions on Software Engineering
Optimal bus sequencing for escape routing in dense PCBs

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
New efficient algorithms for the LCS and constrained LCS problems

Information Processing Letters
An all-substrings common subsequence algorithm

Discrete Applied Mathematics
Finding a longest common subsequence between a run-length-encoded string and an uncompressed string

Journal of Complexity
Algorithms for computing variants of the longest common subsequence problem

Theoretical Computer Science
A New Efficient Algorithm for Computing the Longest Common Subsequence

AAIM '07 Proceedings of the 3rd international conference on Algorithmic Aspects in Information and Management
Fast Algorithms for Computing Tree LCS

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
On the Longest Common Parameterized Subsequence

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings

Information Processing Letters
Near-Optimal Block Alignments

IEICE - Transactions on Information and Systems
Critical Edition of Sanskrit Texts

Sanskrit Computational Linguistics
Note: Computing the longest topological common subsequence of a symbol-wise totally ordered directed acyclic graph and a sequence

Theoretical Computer Science
Strongest postcondition of unstructured programs

Proceedings of the 11th International Workshop on Formal Techniques for Java-like Programs
Neighbourhood Counting Metric for Sequences

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
LCS Approximation via Embedding into Local Non-repetitive Strings

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Automatic acronym recognition

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Fast algorithms for computing tree LCS

Theoretical Computer Science
On the longest common parameterized subsequence

Theoretical Computer Science
Fixed-parameter tractability results for feedback set problems in tournaments

Journal of Discrete Algorithms
Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Pattern Matching for 321-Avoiding Permutations

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
An almost-linear time and linear space algorithm for the longest common subsequence problem

Information Processing Letters
A fast algorithm for computing a longest common increasing subsequence

Information Processing Letters
Transposition invariant string matching

Journal of Algorithms
Efficient algorithms for the block edit problems

Information and Computation
Common subsequence automaton

CIAA'02 Proceedings of the 7th international conference on Implementation and application of automata
Sparse LCS common substring alignment

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
The longest common subsequence problem a finite automata approach

CIAA'03 Proceedings of the 8th international conference on Implementation and application of automata
Maximum stabbing line in 2D plane

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Content-dependent chunking for differential compression, the local maximum approach

Journal of Computer and System Sciences
Hierarchical program representation for program element matching

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Finding common structured patterns in linear graphs

Theoretical Computer Science
New algorithms for efficient parallel string comparison

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
The longest almost-increasing subsequence

Information Processing Letters
Unraveling complex temporal associations in cellular systems across multiple time-series microarray datasets

Journal of Biomedical Informatics
Bit-Parallel Algorithm for the Constrained Longest Common Subsequence Problem

Fundamenta Informaticae
Multiple kernel learning for heterogeneous anomaly detection: algorithm and aviation safety case study

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast computation of a longest increasing subsequence and application

Information and Computation
Solving longest common subsequence and related problems on graphical processing units

Software—Practice & Experience
A program differencing algorithm for verilog HDL

Proceedings of the IEEE/ACM international conference on Automated software engineering
A graph-based approach to API usage adaptation

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Clustering of vehicle trajectories

IEEE Transactions on Intelligent Transportation Systems
Fast distance multiplication of unit-Monge matrices

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Topics in data structures

Algorithms and theory of computation handbook
General pattern matching

Algorithms and theory of computation handbook
Parallel longest increasing subsequences in scalable time and memory

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Finding Patterns In Given Intervals

Fundamenta Informaticae
The longest almost-increasing subsequence

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Identifying BGP routing table transfers

Computer Networks: The International Journal of Computer and Telecommunications Networking
LCS approximation via embedding into locally non-repetitive strings

Information and Computation
Automatic extraction of acronym definitions from the Web

Applied Intelligence
On the generalized constrained longest common subsequence problems

Journal of Combinatorial Optimization
Systematic editing: generating program transformations from an example

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Selecting peers for execution comparison

Proceedings of the 2011 International Symposium on Software Testing and Analysis
Partial duplicate detection for large book collections

Proceedings of the 20th ACM international conference on Information and knowledge management
Faster algorithms for computing longest common increasing subsequences

Journal of Discrete Algorithms
Faster algorithms for computing longest common increasing subsequences

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Operation-based, fine-grained version control model for tree-based representation

FASE'10 Proceedings of the 13th international conference on Fundamental Approaches to Software Engineering
Efficient algorithms for finding a longest common increasing subsequence

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
A two-phase differential synchronization algorithm for remote files

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
An improved plagiarism detection scheme based on semantic role labeling

Applied Soft Computing
Fitness distance correlation and search space analysis for permutation based problems

EvoCOP'10 Proceedings of the 10th European conference on Evolutionary Computation in Combinatorial Optimization
Utilizing dynamically updated estimates in solving the longest common subsequence problem

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Normalized similarity of RNA sequences

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Fast algorithms for computing the constrained LCS of run-length encoded strings

Theoretical Computer Science
Computing a longest increasing subsequence of length k in time O(n log log k)

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Fast and cache-oblivious dynamic programming with local dependencies

LATA'12 Proceedings of the 6th international conference on Language and Automata Theory and Applications
Quick-MLCS: a new algorithm for the multiple longest common subsequence problem

Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
Finding translations in scanned book collections

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Speeding-up Hirschberg and Hunt-Szymanski LCS Algorithms

Fundamenta Informaticae - Computing Patterns in Strings
A case study of cross-system porting in forked projects

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Common structured patterns in linear graphs: approximation and combinatorics

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Algorithms for computing the longest parameterized common subsequence

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Robust plagiary detection using semantic compression augmented SHAPD

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
A divide and conquer approach and a work-optimal parallel algorithm for the LIS problem

Information Processing Letters
LASE: locating and applying systematic edits by learning from examples

Proceedings of the 2013 International Conference on Software Engineering
Computing a longest common subsequence that is almost increasing on sequences having no repeated elements

Journal of Discrete Algorithms
CIRCE: Correcting Imprecise Readings and Compressing Excrescent points for querying common patterns in uncertain sensor streams

Information Systems

Quantified Score

Hi-index	48.23

Visualization

Abstract

Previously published algorithms for finding the longest common subsequence of two sequences of length n have had a best-case running time of O(n2). An algorithm for this problem is presented which has a running time of O((r + n) log n), where r is the total number of ordered pairs of positions at which the two sequences match. Thus in the worst case the algorithm has a running time of O(n2 log n). However, for those applications where most positions of one sequence match relatively few positions in the other sequence, a running time of O(n log n) can be expected.