Removing artifacts of approximated motifs

  • Authors:
  • Maria Federico;Nadia Pisanti

  • Affiliations:
  • Dipartimento di Ingegneria dell'Informazione, Università di Modena e Reggio, Emilia, Italy and Dipartimento di Informatica, Università di Pisa, Italy;Dipartimento di Informatica, Università di Pisa, Italy

  • Venue:
  • ITBAM'11 Proceedings of the Second international conference on Information technology in bio- and medical informatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent patterns (motifs) in biological sequences are good candidates to correspond to structural or functional important elements. The typical output of existing tools for the exhaustive detection of approximated motifs is a long list of motifs containing some real motifs (i.e., patterns representing functional elements) along with a large number of random variations of them, called artifacts. Artifacts increase the output size, often leading to redundant and poorly usable results for biologists. In this paper, we provide a new solution to the problem of separating real motifs from artifacts. We define a notion of motif maximality, called maximality in conservation, which, if applied to the output of existing motif finding tools, allows us to identify and remove artifacts. Their detection is based on the fact that variations of a motif share a large subset of occurrences of the real motif, but the latter is more conserved than any of its artifacts. Experiments show that the tool we implemented according to such definition allows a sensible reduction of the output size removing artifacts with a negligible time cost.