Structural analysis of gapped motifs of a string

  • Authors:
  • Esko Ukkonen

  • Affiliations:
  • Helsinki Institute for Information Technology, Helsinki University of Technology and University of Helsinki, University of Helsinki

  • Venue:
  • MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the structure of the set of gapped motifs (repeated patterns with don't cares) of a given string of symbols. A natural equivalence classification is introduced for the motifs, based on their pattern of occurrences, and another classification for the occurrence patterns, based on the induced motifs. Quadratic-time algorithms are given for finding a maximal representative for an equivalence class while the problems of finding a minimal representative are shown NP-complete. Maximal gapped motifs are shown to be composed of blocks that are maximal non-gapped motifs. These can be found using suffix-tree techniques. This leads to a bound on the number of gapped motifs that have a fixed number of non-gapped blocks.