DNA Motif Representation with Nucleotide Dependency

  • Authors:
  • Francis Chin;Henry C. M. Leung

  • Affiliations:
  • -;-

  • Venue:
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of discovering novel motifs of binding sites is important to theunderstanding of gene regulatory networks. Motifs are generally represented by matrices (PWM orPSSM) or strings. However, these representations cannot model biological binding sites wellbecause they fail to capture nucleotide interdependence. It has been pointed out by manyresearchers that the nucleotides of the DNA binding site cannot be treated independently, e.g. thebinding sites of zinc finger in proteins. In this paper, a new representation called Scored PositionSpecific Pattern (SPSP), which is a generalization of the matrix and string representations, isintroduced which takes into consideration the dependent occurrences of neighboring nucleotides.Even though the problem of discovering the optimal motif in SPSP representation is proved to beNP-hard, we introduce a heuristic algorithm called SPSP-Finder, which can effectively findoptimal motifs in most simulated cases and some real cases for which existing popular motiffindingsoftware, such as Weeder, MEME and AlignACE, fail.