PROMOCO: a New Program for Prediction of cis Regulatory Elements: From High-Information Content Analysis to Clique Identification

  • Authors:
  • Guojun Li;Jizhu Lu;Victor Olman;Ying Xu

  • Affiliations:
  • University of Georgia;University of Georgia;University of Georgia;University of Georgia

  • Venue:
  • CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a computational study for prediction of cis regulatory elements. We model the problem as follows. Each set of conserved binding motifs, evolved from one common ancestor, have a short (Hamming) distance from this ancestor. The problem is to identify a set of l -mers from a given set of promoter sequences which have at most k different positions from the to-beidentified ancestor. A number of papers published in the past attempt to solve this challenging problem. Although the putative ancestor is unknown, even it does not appear in whole background database, we may assume that an instance of it at hand since we can guess it. Our main contribution in this paper is to develop an algorithm, named PROMOCO (PROfile Motif Collection), to find a profile containing all the motifs and relatively small number of random l -mers so that the consensus of the profile would be the putative ancestor. The key idea of the PROMOCO algorithm lies in a new distance measure. Two classes of computational approaches have been developed and widely used for prediction of cis regulatory elements. One class of methods essentially treat the identification problem of cis regulatory elements as identification of a group of l -mers that exhibit high-information content when aligned. Another class of methods solve the problem through identification of cliques in a graph representation of l -mers where a pair of l -mers are linked by an edge if and only if their (Hamming) distance is below some predefined threshold. While intuitively similar, the detailed relationship between these two classes of algorithms has not been carefully investigated. We present a computational study for prediction of cis regulatory elements. We model the problem as follows. Each set of conserved binding motifs, evolved from one common ancestor, have a short (Hamming) distance from this ancestor. The problem is to identify a set of l -mers from a given set of promoter sequences which have at most different positions from the to-beidentified