RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Bases of Motifs for Generating Repeated Patterns with Wild Cards
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining Loosely Structured Motifs from Biological Data
IEEE Transactions on Knowledge and Data Engineering
Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time
Theoretical Computer Science
Discovering subword associations in strings in time linear in the output size
Journal of Discrete Algorithms
Maximal and minimal representations of gapped and non-gapped motifs of a string
Theoretical Computer Science
VARUN: Discovering Extensible Motifs under Saturation Constraints
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Note: Extracting string motif bases for quorum higher than two
Theoretical Computer Science
Hi-index | 0.00 |
We address the problem of extracting pairs of subwords (m1,m2) from a text string s of length n, such that, given also an integer constant d in input, m1 and m2 occur in tandem within a maximum distance of d symbols in s. The main effort of this work is to eliminate the possible redundancy from the candidate set of the so found tandem motifs. To this aim, we first introduce the concept of maximality, characterized by four specific conditions, that we show to be not deducible by the corresponding notion of maximality already defined for "simple" (i.e., non tandem) motifs. Then, we further eliminate the remaining redundancy by defining the concept of irredundancy for tandem motifs. We prove that the number of non-overlapping irredundant tandems is O(d2n) which, considering d as a constant, leads to a linear number of tandems in the length of the input string. This is an order of magnitude less than previously developed compact indexes for tandem extraction. As a further contribution we show an algorithm to extract this compact irredundant index.