Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance

  • Authors:
  • Kimmo Fredriksson;Szymon Grabowski

  • Affiliations:
  • Department of Computer Science, University of Kuopio, Kuopio, Finland 70211;Department of Computer Engineering, Technical University of Łódź, Lodz, Poland 90-924

  • Venue:
  • Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop efficient dynamic programming algorithms for pattern matching with general gaps and character classes. We consider patterns of the form p 0 g(a 0,b 0)p 1 g(a 1,b 1)驴p m驴1, where p i 驴 Σ, Σ is some finite alphabet, and g(a i ,b i ) denotes a gap of length a i 驴b i between symbols p i and p i+1. The text symbol t j matches p i iff t j 驴 p i . Moreover, we require that if p i matches t j , then p i+1 should match one of the text symbols $$ t_{j+a_i+1} \ldots t_{j+b_i+1}.$$ Either or both of a i and b i can be negative. We also consider transposition invariant matching, i.e., the matching condition becomes t j 驴 p i + 驴, for some constant 驴 determined by the algorithms. We give algorithms that have efficient average and worst case running times. The algorithms have important applications in music information retrieval and computational biology. We give experimental results showing that the algorithms work well in practice.