Speeding up transposition-invariant string matching

Authors:
Sebastian Deorowicz
Affiliations:
Silesian University of Technology, Institute of Computer Science, Gliwice, Poland
Venue:
Information Processing Letters
Year:
2006

Citing 7
Cited 3

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A fast algorithm for computing longest common subsequences

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Algorithms for Transposition Invariant String Matching

STACS '03 Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science
A Survey of Longest Common Subsequence Algorithms

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices

SIAM Journal on Computing
Transposition invariant string matching

Journal of Algorithms

Real-Time String Filtering of Large Databases Implemented Via a Combination of Artificial Neural Networks

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II
Solving longest common subsequence and related problems on graphical processing units

Software—Practice & Experience
A survey of query-by-humming similarity methods

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.89

Visualization

Abstract

Finding the longest common subsequence (LCS) of two given sequences A=a0a1 ... am-1 and B = b0b1 ... bn-1 is an important and well studied problem. We consider its generalization, transposition-invariant LCS (LCTS), which has recently arisen in the field of music information retrieval. In LCTS, we look for the LCS between the sequences A + t = (a0 + t)(a1 +t)... (am-1 + t) and B where t is any integer. We introduce a family of algorithms (motivated by the Hunt-Szymanski scheme for LCS), improving the currently best known complexity from O(mn log logσ to O(D loglogσ + mn), where σ is the alphabet size and D ≤ mn is the total number of dominant matches for all transpositions. Then, we demonstrate experimentally that some of our algorithms outperform the best ones from literature.