A cost-aggregating integer linear program for motif finding

Authors:
Carl Kingsford;Elena Zaslavsky;Mona Singh
Affiliations:
Center for Bioinformatics & Computational Biology and Department of Computer Science, University of Maryland, College Park, MD, United States;Department of Neurology and the Center for Translational Systems Biology, Mount Sinai School of Medicine, New York, NY, United States;Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, United States
Venue:
Journal of Discrete Algorithms
Year:
2011

Citing 10
Cited 0

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Approximation algorithms for multiple sequence alignment

Theoretical Computer Science
Combinatorial optimization

Combinatorial optimization
On approximation algorithms for local multiple alignment

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Comparative analysis of methods for representing and searching for transcription factor binding sites

Bioinformatics
Solving and analyzing side-chain positioning problems using linear and integer programming

Bioinformatics
A Semidefinite Programming Approach to Side Chain Positioning with New Rounding Strategies

INFORMS Journal on Computing
A compact mathematical programming formulation for DNA motif finding

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the motif finding problem one seeks a set of mutually similar substrings within a collection of biological sequences. This is an important and widely-studied problem, as such shared motifs in DNA often correspond to regulatory elements. We study a combinatorial framework where the goal is to find substrings of a given length such that the sum of their pairwise distances is minimized. We describe a novel integer linear program for the problem, which uses the fact that distances between substrings come from a limited set of possibilities allowing for aggregate consideration of sequence position pairs with the same distances. We show how to tighten its linear programming relaxation by adding an exponential set of constraints and give an efficient separation algorithm that can find violated constraints, thereby showing that the tightened linear program can still be solved in polynomial time. We apply our approach to find optimal solutions for the motif finding problem and show that it is effective in practice in uncovering known transcription factor binding sites.