Extracting approximate patterns

Authors:
Johann Pelfrêne;Saïd Abdeddaïm;Joël Alexandre
Affiliations:
ExonHit Therapeutics, Boulevard Masséna, Paris and ABISS, UMR, CNRS, Université de Rouen, Mont Saint Aignan;ABISS, LIFAR, Université de Rouen, Mont Saint Aignan;ABISS, UMR, CNRS, Université de Rouen, Mont Saint Aignan
Venue:
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Year:
2003

Citing 5
Cited 4

Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications

Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Compression and the Wheel of Fortune

DCC '03 Proceedings of the Conference on Data Compression
Bases of Motifs for Generating Repeated Patterns with Don''t Cares

Bases of Motifs for Generating Repeated Patterns with Don''t Cares

Bases of Motifs for Generating Repeated Patterns with Wild Cards

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Languages with mismatches and an application to approximate indexing

DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
Note: Extracting string motif bases for quorum higher than two

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a sequence, approximate patterns are exponential in number. In this paper, we present a new notion of basis for the patterns with don't cares occurring in a given text (sequence). The primitive patterns are of interest since their number is lower than previous known definitions (and in a case, sub-linear in the size of the text), and these patterns can be used to extract all the patterns of a text. We present an incremental algorithm that computes the primitive patterns occurring at least q times in a text of length n, given the N primitive patterns occurring at least q-1 times, in time O(|Σ|Nn2 log2 n log log n). In the particular case where q = 2, the complexity in time is only O(|Σ|n2 log2 n log log n). We also give an algorithm that decides if a given pattern is primitive in a given text.