Accurate prediction of enzyme mutant activity based on a multibody statistical potential

  • Authors:
  • Majid Masso;Iosif I. Vaisman

  • Affiliations:
  • -;-

  • Venue:
  • Bioinformatics
  • Year:
  • 2007

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity. Results: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance. Availability: Prediction databases at http://proteins.gmu.edu/automute/ Contact: ivaisman@gmu.edu Supplementary information: Supplementary data are available at Bioinformatics online.