Efficient algorithms for pattern matching with general gaps and character classes

  • Authors:
  • Kimmo Fredriksson;Szymon Grabowski

  • Affiliations:
  • Department of Computer Science, University of Joensuu, Joensuu, Finland;Computer Engineering Department, Technical University of Łódź, Łódź, Poland

  • Venue:
  • SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop efficient dynamic programming algorithms for a pattern matching with general gaps and character classes. We consider patterns of the form p0g(a0,b0) p1g(a1,b1) ...pm−−1, where pi ⊂Σ, where Σ is some finite alphabet, and g(ai,bi) denotes a gap of length ai ...bi between symbols pi and pi+1. The text symbol tj matches pi iff tj ∈pi. Moreover, we require that if pi matches tj, then pi+1 should match one of the text symbols $t_{j+a_{i}+1} \ldots t_{j+b_i+1}$. Either or both of ai and bi can be negative. We give algorithms that have efficient average and worst case running times. The algorithms have important applications in music information retrieval and computational biology. We give experimental results showing that the algorithms work well in practice.