A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories

Authors:
Tiankai Tu;Charles A. Rendleman;David W. Borhani;Ron O. Dror;Justin Gullingsrud;Morten Ø. Jensen;John L. Klepeis;Paul Maragakis;Patrick Miller;Kate A. Stafford;David E. Shaw
Affiliations:
D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 11
Cited 5

External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Parallelizing Molecular Dynamics Programs for Distributed-Memory Machines

IEEE Computational Science & Engineering
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scalable algorithms for molecular dynamics simulations on commodity clusters

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation

Proceedings of the 34th annual international symposium on Computer architecture
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Blue matter: scaling of N-body simulations to one atom per node

IBM Journal of Research and Development
Entering the petaflop era: the architecture and performance of Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Accelerating parallel analysis of scientific simulation data via Zazen

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Simplified parallel domain traversal

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
MapReduce in MPI for Large-scale graph algorithms

Parallel Computing
A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach

Computers in Biology and Medicine
Performance comparison under failures of MPI and MapReduce: An analytical approach

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As parallel algorithms and architectures drive the longest molecular dynamics (MD) simulations towards the millisecond scale, traditional sequential post-simulation data analysis methods are becoming increasingly untenable. Inspired by the programming interface of Google's MapReduce, we have built a new parallel analysis framework called HiMach, which allows users to write trajectory analysis programs sequentially, and carries out the parallel execution of the programs automatically. We introduce (1) a new MD trajectory data analysis model that is amenable to parallel processing, (2) a new interface for defining trajectories to be analyzed, (3) a novel method to make use of an existing sequential analysis tool called VMD, and (4) an extension to the original MapReduce model to support multiple rounds of analysis. Performance evaluations on up to 512 cores demonstrate the efficiency and scalability of the HiMach framework on a Linux cluster.