101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem

Authors:
Giuseppe Lancia;Robert Carr;Brian Walenz;Sorin Istrail
Affiliations:
Celera Genomics, Rockville, MD and D.E.I., University of Padova;Sandia National Labs, Albuquerque, NM;Celera Genomics, Rockville, MD;Celera Genomics, Rockville, MD
Venue:
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Year:
2001

Citing 9
Cited 24

Finding a maximum clique in an arbitrary graph

SIAM Journal on Computing
Integer and combinatorial optimization

Integer and combinatorial optimization
Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Combinatorial optimization

Combinatorial optimization
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, Workshop, October 11-13, 1993

Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, Workshop, October 11-13, 1993
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Algorithmic Aspects of Protein Structure Similarity

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Algorithms for Weakly Triangulated Graphs

Algorithms for Weakly Triangulated Graphs

Structural alignment of large—size proteins via lagrangian relaxation

Proceedings of the sixth annual international conference on Computational biology
Towards Optimally Solving the LONGEST COMMON SUBSEQUENCE Problem for Sequences with Nested Arc Annotations in Linear Time

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
A grid-aware approach to protein structure comparison

Journal of Parallel and Distributed Computing - High-performance computational biology
Computing the similarity of two sequences with nested arc annotations

Theoretical Computer Science
A Study on the use of "self-generation'' in memetic algorithms

Natural Computing: an international journal
Self Generating Metaheuristics in Bioinformatics: The Proteins Structure Comparison Case

Genetic Programming and Evolvable Machines
Integer programming models for computational biology problems

Journal of Computer Science and Technology - Special issue on bioinformatics
Graph algorithms for biological systems analysis

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient Sequence Alignment with Side-Constraints by Cluster Tree Elimination

Constraints
Invariant features for searching in protein fold databases

International Journal of Computer Mathematics - Bioinformatics
On Protein Structure Alignment under Distance Constraint

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Bimal: bipartite matching alignment for the contact map overlap problem

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A fuzzy sets based generalization of contact maps for the overlap of protein structures

Fuzzy Sets and Systems
Joining softassign and dynamic programming for the contact map overlap problem

BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
A similarity matrix-based hybrid algorithm for the contact map overlaps problem

Computers in Biology and Medicine
A Spectral Approach to Protein Structure Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On protein structure alignment under distance constraint

Theoretical Computer Science
Maximum cliques in protein structure comparison

SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
What makes the arc-preserving subsequence problem hard?

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Fast protein structure alignment

ISBRA'10 Proceedings of the 6th international conference on Bioinformatics Research and Applications
What makes the arc-preserving subsequence problem hard?

Transactions on Computational Systems Biology II
A parameterized algorithm for protein structure alignment

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
A branch-and-reduce algorithm for the contact map overlap problem

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Predicting helix pair structure from fuzzy contact maps

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Structure comparison is a fundamental problem for structural genomics. A variety of structure comparison methods were proposed and several protein structure classification servers e.g., SCOP, DALI, CATH, were designed based on them, and are extensively used in practice. This area of research continues to be very active, being energized bi-annually by the CASP folding competitions, but despite the extraordinary international research effort devoted to it, progress is slow. A fundamental dimension of this bottleneck is the absence of rigorous algorithmic methods. A recent excellent survey on structure comparison by Taylor et.al. [23] records the state of the art of the area: In structure comparison, we do not even have an algorithm that guarantees an optimal answer for pairs of structures …In this paper we provide the first rigorous algorithm for structure comparison. Our method is based on developing an effective integer linear programming (IP) formulation of protein structure contact maps overlap (CMO), and a branch-and-cut strategy that employs lower-bounding heuristics at the branch nodes. Our algorithms identified a gallery of optimal and near-optimal structure alignments for pairs of proteins from the Protein Data Bank with up to 80 amino acids and about 150 contacts each — problems of instance size of about 300. Although these sizes also reflect our current limitations, these are the first provable optimal and near-optimal algorithms in the literature for a measure of structure similarity which sees extensive practical use. At the heart of our success in finding optimal alignments is a reduction of the CMO optimization to the maximum independent set (MIS) problem on special graphs. For CMO instances of size 300, the corresponding MIS graph instance contains about 10,000 nodes. While our algorithms are able to solve to optimality MIS problem of these sizes, the known optimal algorithms for the MIS on general graphs can at present only solve instances with up to a few hundred nodes. This is the first effective use of IP methods in protein structure comparison; the biomolecular structure literature contains only one other effective IP method devoted to RNA comparison, due to Lenhof et.al. [18].The hybrid heuristic approach that worked well for providing lower bounds in the branch and cut algorithm was tried on large proteins in a test set suggested by Jeffrey Skolnick. It involved 33 proteins classified into four families: Flavodoxin-like fold CheY-related, Plastocyanin, TIM Barrel, and Ferratin. Out of the set of all 528 pairwise structure alignments, we have validated the clustering with a 98.7% accuracy (1.3% false negatives and 0% false positives).