Incremental discovery of the irredundant motif bases for all suffixes of a string in O(n2logn) time

Authors:
Alberto Apostolico;Claudia Tagliacollo
Affiliations:
Accademia Nazionale dei Lincei, Rome, Italy and College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA 30318, USA;Accademia Nazionale dei Lincei, Rome, Italy
Venue:
Theoretical Computer Science
Year:
2008

Citing 9
Cited 4

Algorithms for finding patterns in strings

Handbook of theoretical computer science (vol. A)
Pattern matching algorithms

Pattern matching algorithms
Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Verifying candidate matches in sparse and wildcard matching

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications

Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Algorithmic techniques in computational genomics

Algorithmic techniques in computational genomics
Bases of Motifs for Generating Repeated Patterns with Wild Cards

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Optimal offline extraction of irredundant motif bases

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics

Optimal extraction of motif patterns in 2D

Information Processing Letters
MADMX: a novel strategy for maximal dense motif extraction

WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Note: Extracting string motif bases for quorum higher than two

Theoretical Computer Science
Characterization and extraction of irredundant tandem motifs

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	5.23

Visualization

Abstract

Compact bases formed by motifs called ''irredundant'' and capable of generating all other motifs in a sequence have been proposed in recent years and successfully tested in tasks of biosequence analysis and classification. Given a sequence s of n characters drawn from an alphabet @S, the problem of extracting such a base from s had been previously solved in time O(n^2lognlog|@S|) and O(|@S|n^2log^2nloglogn), respectively, using the FFT-based string searching by Fischer and Paterson. More recently, a solution on binary strings taking time O(n^2) without resorting to the FFT was also proposed. In the present paper, we considered the problem of incrementally extracting the bases of all suffixes of a string. This problem was solved in a previous work in time O(n^3). A much faster incremental algorithm is described here, which takes time O(n^2logn) for binary strings. Although this algorithm does not make use of the FFT, its performance is comparable to the one exhibited by the previous FFT-based algorithms involving the computation of only one base. The implicit representation of a single base requires O(n) space, whence for finite alphabets the proposed solution is within a logn factor from optimality.