Discovery of Ambiguous Patterns in Sequences: Application to Bioinformatics

  • Authors:
  • Gerard Ramstein;Pascal Bunelle;Yannick Jacques

  • Affiliations:
  • -;-;-

  • Venue:
  • PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important issue in data mining concerns the discovery of patterns presenting a user-specified minimum support. We generalize this problematics by introducing the concept of ambiguous event. An ambiguous event can be substituated for another without modifying the substance of the concerned pattern. For instance, in molecular biology, researchers attempt to identify conserved patterns in a family of proteins for which they know that they have evolved from a common ancestor. Such patterns are flexible in the sense that some residues may have been substituated for others during evolution. A[B C] is an example of notation of an ambiguous pattern representing the event A, followed by either the event B or C. A new scoring scheme is proposed for the computation of the frequency of ambiguous patterns, based on substitution matrices. A substitution matrix expresses the probability of the replacement of an event by another. We propose to adapt the Winepi algorithm [1] to ambiguous events. Finally, we give an application to the discovery of conserved patterns in a particular family of proteins, the cytokine receptors.