Global optimization
Algorithm 806: SPRNG: a scalable library for pseudorandom number generation
ACM Transactions on Mathematical Software (TOMS)
Statistical scalability analysis of communication operations in distributed applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Numerical Optimization of Computer Models
Numerical Optimization of Computer Models
Testing parallel random number generators
Parallel Computing
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Parallel implementation of the replica exchange molecular dynamics algorithm on blue gene/L
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallel tempering MCMC acceleration using reconfigurable hardware
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Hi-index | 0.00 |
Parallel tempering (PT), also known as replica exchange, is a powerful Markov Chain Monte Carlo sampling approach, which aims at reducing the relaxation time in simulations of physical systems. In this paper, we present a novel decentralized parallel implementation of PT using the message passing interface (MPI) and the scalable parallel random number generators (SPRNG) library. By taking advantage of the characteristics of pseudorandom number generators, this implementation eliminates global synchronization and reduces the overhead caused by interprocessor communication in replica exchange in PT. Moreover, our proposed non-blocking replica exchange reduces communication overhead in pair-wise process replica exchanges by allowing the process reaching the replica exchange point to leap-ahead while waiting for the other one to reach the common replica exchange point. Also, temperature exchange instead of conformation replica exchange is proposed to reduce communication and achieve load balancing in the participating processors in the PT computation. All these enable one to efficiently apply PT to large-scale massively parallel systems. The efficiency of this parallel PT implementation is demonstrated in the context of minimizing various benchmark functions with complicated landscapes as objective functions. Our computational results and analysis have shown that the decentralized PT is scalable, reproducible, load-balanced, and yields insignificant communication overhead.