A new approach to text searching
Communications of the ACM
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A fast bit-vector algorithm for approximate string matching based on dynamic programming
Journal of the ACM (JACM)
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Efficient string matching: an aid to bibliographic search
Communications of the ACM
An Output-Sensitive Flexible Pattern Discovery Algorithm
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Extracting approximate patterns
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Theoretical Computer Science
Detection of subtle variations as consensus motifs
Theoretical Computer Science
Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time
Theoretical Computer Science
Efficient construction of maximal and minimal representations of motifs of a string
Theoretical Computer Science
Optimal extraction of motif patterns in 2D
Information Processing Letters
Masking patterns in sequences: A new class of motif discovery with don't cares
Theoretical Computer Science
Maximal and minimal representations of gapped and non-gapped motifs of a string
Theoretical Computer Science
On the complexity of finding gapped motifs
Journal of Discrete Algorithms
MADMX: a novel strategy for maximal dense motif extraction
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Removing artifacts of approximated motifs
ITBAM'11 Proceedings of the Second international conference on Information technology in bio- and medical informatics
Note: Extracting string motif bases for quorum higher than two
Theoretical Computer Science
Incremental discovery of irredundant motif bases in time O(|Σ|n2 log n)
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Structural analysis of gapped motifs of a string
MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Optimal offline extraction of irredundant motif bases
COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
Characterization and extraction of irredundant tandem motifs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
A parameterizable enumeration algorithm for sequence mining
Theoretical Computer Science
Aligning discovered patterns from protein family sequences
PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
Faster variance computation for patterns with gaps
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Boolean satisfiability for sequence mining
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.02 |
Motif inference represents one of the most important areas of research in computational biology, and one of its oldest ones. Despite this, the problem remains very much open in the sense that no existing definition is fully satisfying, either in formal terms, or in relation to the biological questions that involve finding such motifs. Two main types of motifs have been considered in the literature: matrices (of letter frequency per position in the motif) and patterns. There is no conclusive evidence in favor of either, and recent work has attempted to integrate the two types into a single model. In this paper, we address the formal issue in relation to motifs as patterns. This is essential to get at a better understanding of motifs in general. In particular, we consider a promising idea that was recently proposed, which attempted to avoid the combinatorial explosion in the number of motifs by means of a generator set for the motifs. Instead of exhibiting a complete list of motifs satisfying some input constraints, what is produced is a basis of such motifs from which all the other ones can be generated. We study the computational cost of determining such a basis of repeated motifs with wild cards in a sequence. We give new upper and lower bounds on such a cost, introducing a notion of basis that is provably contained in (and, thus, smaller) than previously defined ones. Our basis can be computed in less time and space, and is still able to generate the same set of motifs. We also prove that the number of motifs in all bases defined so far grows exponentially with the quorum, that is, with the minimal number of times a motif must appear in a sequence, something unnoticed in previous work. We show that there is no hope to efficiently compute such bases unless the quorum is fixed.