Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Hi-index | 0.00 |
We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. Load-balancing, fault-tolerance, and scalability in MapReduce allows screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking and crossdocking for a series of HIV protease inhibitors. Our method demonstrates significant improvement over "energy-only" scoring for the accurate identification of native ligand geometries. The advantages of this approach make it attractive for complex applications in real-world drug design efforts.