Transposition invariant pattern matching for multi-track strings

  • Authors:
  • Kjell Lemström;Jorma Tarhio

  • Affiliations:
  • University of Helsinki, Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland;Helsinki University of Technology, Dept. of Computer Science and Engineering, P.O. Box 5400, FIN-O2015 HUT, Finland

  • Venue:
  • Nordic Journal of Computing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of multi-track string matching. The task is to find the occurrences of a pattern across parallel strings. Given an alphabet Σ of natural numbers and a set S over Σ of h strings si = si1 ... sinn for i =1,...,h a pattern p = p1...pm has such an occurrence at position j of S if p1 = si1j, p2 = si2j+1,...,pm = simj+m-1 holds for i1,.....,im ∈ {1 ..... h}. An application of the problem is music retrieval where occurrences of a monophonic query pattern are searched in a polyphonic music database. In music retrieval it is even more pertinent to allow invariance for pitch level transpositions, i.e., the task is to find whether there are occurrences of p in S such that the formulation above becomes p1 = si1j + c, p2 = si2j+1 + c,....., pm= Simj+m-1 + c for some constant c. We present several algorithms solving the problem. Our main contribution, the MONOPOLY algorithm, is a transposition-invariant bit-parallel filtering algorithm for static databases. After an O(nhe) time preprocessing, it finds candidates for transposition invariant occurrences in time O(n⌈m/w⌉ + m + d) where w, e, and d denote the size of the machine word in bits and two factors dependent on the size of the alphabet, respectively. A straightforward algorithm is used to check whether the candidates are proper occurrences. The algorithm needs time O(hm) per candidate.