Removing artifacts of approximated motifs

Authors:
Maria Federico;Nadia Pisanti
Affiliations:
Dipartimento di Ingegneria dell'Informazione, Università di Modena e Reggio, Emilia, Italy and Dipartimento di Informatica, Università di Pisa, Italy;Dipartimento di Informatica, Università di Pisa, Italy
Venue:
ITBAM'11 Proceedings of the Second international conference on Information technology in bio- and medical informatics
Year:
2011

Citing 7
Cited 0

Searching for flexible repeated patterns using a non-transitive similarity relation

Pattern Recognition Letters
Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Pattern discovery on character sets and real-valued data: linear bound on irredundant motifs and an efficient polynomial time algorithm

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Computation and Visualization of Degenerate Repeats in Complete Genomes

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Finding Approximate Repetitions under Hamming Distance

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Bases of Motifs for Generating Repeated Patterns with Wild Cards

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Suffix tree characterization of maximal motifs in biological sequences

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent patterns (motifs) in biological sequences are good candidates to correspond to structural or functional important elements. The typical output of existing tools for the exhaustive detection of approximated motifs is a long list of motifs containing some real motifs (i.e., patterns representing functional elements) along with a large number of random variations of them, called artifacts. Artifacts increase the output size, often leading to redundant and poorly usable results for biologists. In this paper, we provide a new solution to the problem of separating real motifs from artifacts. We define a notion of motif maximality, called maximality in conservation, which, if applied to the output of existing motif finding tools, allows us to identify and remove artifacts. Their detection is based on the fact that variations of a motif share a large subset of occurrences of the real motif, but the latter is more conserved than any of its artifacts. Experiments show that the tool we implemented according to such definition allows a sensible reduction of the output size removing artifacts with a negligible time cost.