On the complexity of finding gapped motifs

  • Authors:
  • Morris Michael;François Nicolas;Esko Ukkonen

  • Affiliations:
  • Department of Computer Science, PO Box 68 (Gustaf Hällströmin katu 2b), FIN-00014 University of Helsinki, Finland;Department of Computer Science, PO Box 68 (Gustaf Hällströmin katu 2b), FIN-00014 University of Helsinki, Finland;Department of Computer Science, PO Box 68 (Gustaf Hällströmin katu 2b), FIN-00014 University of Helsinki, Finland

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A gapped pattern is a sequence consisting of regular alphabet symbols and of joker symbols that match any alphabet symbol. The content of a gapped pattern is defined as the number of its non-joker symbols. A gapped motif is a gapped pattern that occurs repeatedly in a string or in a set of strings. The aim of this paper is to study the complexity of several gapped-motif-finding problems. The following three decision problems are shown NP-complete, even if the input alphabet is binary. (i) Given a string T and two integers c and q, decide whether or not there exists a gapped pattern with content c (or more) that occurs in T at q distinct positions (or more). (ii) Given a set of strings S and an integer c, decide whether or not there exists a gapped pattern with content c that occurs at least once in each string of S. (iii) Given m strings with the same length, and two integers c and q, decide whether or not there exists a gapped pattern with content c that matches at least q input strings. We also present a non-naive quadratic-time algorithm that solves the following optimization problem: given a string T and an integer q=1, compute a maximum-content gapped pattern Q such that q consecutive copies of Q occur in T.