A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach

  • Authors:
  • T. Estrada;B. Zhang;P. Cicotti;R. S. Armen;M. Taufer

  • Affiliations:
  • Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, United States;Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, United States;San Diego Supercomputer Center, La Jolla, CA 92093, United States;Department of Pharmaceutical Sciences, Thomas Jefferson University School of Pharmacy, Philadelphia, PA 19107, United States;Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, United States

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking.