Discovering Frequent Structured Patterns from String Databases: An Application to Biological Sequences

  • Authors:
  • Luigi Palopoli;Giorgio Terracina

  • Affiliations:
  • -;-

  • Venue:
  • DS '02 Proceedings of the 5th International Conference on Discovery Science
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last years, the completion of the human genome sequencing showed up a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences. Because of biological observations, a specific class of patterns is becoming particularly interesting: frequent structured patterns. In this respect, it is biologically meaningful to look at both "exact" and "approximate" repetitions of the patterns within the available sequences.This paper gives a contribution in this setting by providing some algorithms which allow to discover frequent structured patterns, either in "exact" or "approximate" form, present in a collection of input biological sequences.