The multiple sequence alignment problem in biology
SIAM Journal on Applied Mathematics
Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Local multiple alignment via subgraph enumeration
Discrete Applied Mathematics - Special volume on computational molecular biology
Fast discovery of association rules
Advances in knowledge discovery and data mining
Progressive multiple alignment with constraints
RECOMB '97 Proceedings of the first annual international conference on Computational molecular biology
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Motif discovery without alignment or enumeration (extended abstract)
RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
Sequence homology detection through large scale pattern discovery
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Abstract-Driven Pattern Discovery in Databases
IEEE Transactions on Knowledge and Data Engineering
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
A Double Combinatorial Approach to Discovering Patterns in Biological Sequences
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Algorithmic techniques in computational genomics
Algorithmic techniques in computational genomics
An Output-Sensitive Flexible Pattern Discovery Algorithm
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Hi-index | 0.00 |
Given an input sequence of data, a "rigid" pattern is a repeating sequence, possibly interspersed with "dont care" characters. In practice, the patterns or motifs of interest are the ones that also allow a variable number of gaps (or "dont care" characters): we call these the flexible motifs. The number of rigid motifs could potentially be exponential in the size of the input sequence and in the case where the input is a sequence of real numbers, there could be uncountably infinite number of motifs (assuming two real numbers are equal if they are within some δ 0 of each other). It has been shown earlier that by suitably defining the notion of maximality and redundancy, there exists only a linear (or no more than 3n) number of irredundant motifs and a polynomial time algorithm to detect these irredundant motifs. Here we present a uniform framework that encompasses both rigid and flexible motifs with generalizations to sequence of sets and real numbers and show a somewhat surprising result that the number of irredundant flexible motifs still have a linear bound. However, the algorithm to detect them has a higher complexity than that of the rigid motifs.