Protein Design by Sampling an Undirected Graphical Model of Residue Constraints

Authors:
John Thomas;Naren Ramakrishnan;Chris Bailey-Kellogg
Affiliations:
Dartmouth College, Hanover;Virginia Tech, Blacksburg;Dartmouth College, Hanover
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2009

Citing 4
Cited 2

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Graphical Models of Residue Coupling in Protein Families

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Free energy estimates of all-atom protein structures using generalized belief propagation

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Algorithms for joint optimization of stability and diversity in planning combinatorial libraries of chimeric proteins

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology

Improved multiple sequence alignments using coupled pattern mining

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Improved Multiple Sequence Alignments Using Coupled Pattern Mining

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops an approach for designing protein variants by sampling sequences that satisfy residue constraints encoded in an undirected probabilistic graphical model. Due to evolutionary pressures on proteins to maintain structure and function, the sequence record of a protein family contains valuable information regarding position-specific residue conservation and coupling (or covariation) constraints. Representing these constraints with a graphical model provides two key benefits for protein design: a probabilistic semantics enabling evaluation of possible sequences for consistency with the constraints, and an explicit factorization of residue dependence and independence supporting efficient exploration of the constrained sequence space. We leverage these benefits in developing two complementary MCMC algorithms for protein design: constrained shuffling mixes wild-type sequences positionwise and evaluates graphical model likelihood, while component sampling directly generates sequences by sampling clique values and propagating to other cliques. We apply our methods to design WW domains. We demonstrate that likelihood under a model of wild-type WWs is highly predictive of foldedness of new WWs. We then show both theoretical and rapid empirical convergence of our algorithms in generating high-likelihood, diverse new sequences. We further show that these sequences capture the original sequence constraints, yielding a model as predictive of foldedness as the original one.