Journal of Computational Chemistry
BOINC: A System for Public-Resource Computing and Storage
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Concurrency and Computation: Practice & Experience - Third IEEE International Workshop on High Performance Computational Biology (HiCOMB 2004)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Materialized community ground models for large-scale earthquake simulation
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A massively parallel adaptive fast-multipole method on heterogeneous architectures
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
K-Means Clustering with Bagging and MapReduce
HICSS '11 Proceedings of the 2011 44th Hawaii International Conference on System Sciences
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering very large multi-dimensional datasets with MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Reengineering High-throughput Molecular Datasets for Scalable Clustering Using MapReduce
HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
Hi-index | 0.00 |
We present a scalable and accurate method for classifying protein-ligand binding geometries in molecular docking. Our method is a three-step process: the first step encodes the geometry of a three-dimensional (3D) ligand conformation into a single 3D point in the space; the second step builds an octree by assigning an octant identifier to every single point in the space under consideration; and the third step performs an octree-based clustering on the reduced conformation space and identifies the most dense octant. We adapt our method for MapReduce and implement it in Hadoop. The load-balancing, fault-tolerance, and scalability in MapReduce allow screening of very large conformation spaces not approachable with traditional clustering methods. We analyze results for docking trials for 23 protein-ligand complexes for HIV protease, 21 protein-ligand complexes for Trypsin, and 12 protein-ligand complexes for P38alpha kinase. We also analyze cross docking trials for 24 ligands, each docking into 24 protein conformations of the HIV protease, and receptor ensemble docking trials for 24 ligands, each docking in a pool of HIV protease receptors. Our method demonstrates significant improvement over energy-only scoring for the accurate identification of native ligand geometries in all these docking assessments. The advantages of our clustering approach make it attractive for complex applications in real-world drug design efforts. We demonstrate that our method is particularly useful for clustering docking results using a minimal ensemble of representative protein conformational states (receptor ensemble docking), which is now a common strategy to address protein flexibility in molecular docking.