From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

  • Authors:
  • Eleazar Eskin

  • Affiliations:
  • University of California, San Diego, CA

  • Venue:
  • RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important part of deciphering gene regulatory mechanisms is discovering transcription factor binding sites. In many cases, these sites can be detected because they are often overrepresented in genomic sequences. The detection of the overrepresented signals in sequences, or motif-finding has become a central problem in computational biology. There are two major computational frameworks for attacking the motif finding problem which differ in their representation of the signals. The most popular is the profile or PSSM (Position Specific Scoring Matrix) representation. The goal of these algorithms is to obtain probabilistic representations of the overrepresented signals. Another is the consensus pattern or pattern with mismatches representation which represents a signal as discrete consensus pattern and allows some mismatches to occur in each instance of the pattern. The advantage of profiles is the expressiveness of their representation while the advantage of the consensus pattern approach is the existence of efficient algorithms that guarantee discovery of the best patterns. In this paper we present a unified framework for motif finding which encompasses both the profile representation and the consensus pattern representation. We prove that the problem of discovering the best profiles can be solved by considering a degenerate version of the problem of finding the best consensus patterns. The main advantage of our framework is that it motivates a novel algorithm, MITRA-PSSM, which discovers profiles, yet provides some of the guarantees of discovering the best signals. The algorithm searches for best profiles with respect to information content which is the same criterion of popular algorithms such as MEME and CONSENSUS. MITRA-PSSM is specifically designed for searching for profiles in this framework and introduces a novel notion of scoring consensus patterns, discrete information content. MITRA-PSSM is available for public use via webserver at http://www.calit2.net/compbio/mitra/.