An integrated compilation and performance analysis environment for data parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
NAMD2: greater scalability for parallel molecular dynamics
Journal of Computational Physics - Special issue on computational molecular biophysics
Visualizing the Performance of Parallel Programs
IEEE Software
Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
Adaptive Load Balancing for MPI Programs
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Converse: An Interoperable Framework for Parallel Programming
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
(R) Towards Automatic Performance Analysis
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Scaling an optimistic parallel simulation of large-scale interconnection networks
WSC '05 Proceedings of the 37th conference on Winter simulation
Streamsight: a visualization tool for large-scale streaming applications
Proceedings of the 4th ACM symposium on Software visualization
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Towards a framework for abstracting accelerators in parallel applications: experience with cell
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hi-index | 0.00 |
Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performance challenges must be identified and solved to attain high parallel efficiency in such cases. We present case studies involving NAMD, a parallel classic molecular dynamics application for large biomolecular systems, and CPAIMD, Car-Parrinello ab initio molecular dynamics application, and efforts to scale them to large number of processors. Both applications are implemented in Charm++, and the performance analysis was carried out using Projections, the performance visualization/analysis tool associated with Charm++. We showcase a series of optimizations facilitated by Projections. The resultant performance of NAMD led to a Gordon Bell award at SC 2002 with unprecedented speedup on 3000 processors with teraflops level peak performance. We also explore the techniques for applying the performance visualization/analysis tool on future generation extreme-scale parallel machines and discuss the scalability issues with Projections.