Speeding up transposition-invariant string matching

  • Authors:
  • Sebastian Deorowicz

  • Affiliations:
  • Silesian University of Technology, Institute of Computer Science, Gliwice, Poland

  • Venue:
  • Information Processing Letters
  • Year:
  • 2006

Quantified Score

Hi-index 0.89

Visualization

Abstract

Finding the longest common subsequence (LCS) of two given sequences A=a0a1 ... am-1 and B = b0b1 ... bn-1 is an important and well studied problem. We consider its generalization, transposition-invariant LCS (LCTS), which has recently arisen in the field of music information retrieval. In LCTS, we look for the LCS between the sequences A + t = (a0 + t)(a1 +t)... (am-1 + t) and B where t is any integer. We introduce a family of algorithms (motivated by the Hunt-Szymanski scheme for LCS), improving the currently best known complexity from O(mn log logσ to O(D loglogσ + mn), where σ is the alphabet size and D ≤ mn is the total number of dominant matches for all transpositions. Then, we demonstrate experimentally that some of our algorithms outperform the best ones from literature.