Algorithms for finding patterns in strings
Handbook of theoretical computer science (vol. A)
Pattern matching algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Verifying candidate matches in sparse and wildcard matching
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Algorithmic techniques in computational genomics
Algorithmic techniques in computational genomics
Bases of Motifs for Generating Repeated Patterns with Wild Cards
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimal offline extraction of irredundant motif bases
COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
Optimal extraction of motif patterns in 2D
Information Processing Letters
MADMX: a novel strategy for maximal dense motif extraction
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Note: Extracting string motif bases for quorum higher than two
Theoretical Computer Science
Characterization and extraction of irredundant tandem motifs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 5.23 |
Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.