A comparative analysis method for detecting binding sites in coding regions

  • Authors:
  • Mathieu Blanchette

  • Affiliations:
  • University of California, Santa Cruz, Santa Cruz, CA

  • Venue:
  • RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

While the problem of predicting transcription factor binding sites in a gene's promoter region has been extensively studied, binding sites located in coding regions are also crucial for regulating gene expression but are more difficult to detect. Coding region binding sites are mostly involved in splicing regulation, but also in transcriptional and post-transcriptional regulation. We consider the problem of predicting such binding sites by comparative analysis. Comparative analysis is based on the idea that functional sequences tend to evolve at slower rate than nonfunctional sequence, making unusually well conserved regions likely to be of interest. The difficulty in applying comparative analysis to the detection of binding sites located in coding sequence is that the whole sequence is under selective pressure, because it needs to code for a functional protein. We present a technique to distinguish between conservation due to constraints on the amino acid product and conservation due to constraints imposed by regulatory factors. More precisely, we show how to calculate the probability of observing a certain degree of conservation among the nucleotides of given set of orthologous codons, given a set of constraints on the amino acids they need to encode. The algorithms described are implemented in a program called Cosmo, available at http://bio.cs.washington.edu. We ran Cosmo on several genes known to contain exonic splicing enhancers and report the results.