Parallel algorithms for Bayesian phylogenetic inference
Journal of Parallel and Distributed Computing - High-performance computational biology
Parallel algorithms for Markov chain Monte Carlo methods in latent spatial Gaussian models
Statistics and Computing
Bayesian Analysis of Lidar Signals with Multiple Returns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Interacting sequential Monte Carlo samplers for trans-dimensional simulation
Computational Statistics & Data Analysis
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Full wave form analysis for long-range 3D imaging laser radar
EURASIP Journal on Advances in Signal Processing - Special issue on advanced image processing for defense and security applications
Hi-index | 0.00 |
Bayesian analysis using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms improves the measurement accuracy, resolution and sensitivity of full waveform laser detection and ranging (LaDAR), but at a significant computational cost. Parallel processing has the potential to significantly reduce the processing time, but although there have been several strategies for Markov chain Monte Carlo (MCMC) parallelization, adaptation of these strategies to RJMCMC may degrade parallel performance. In this paper, we describe an approach to parallel RJMCMC processing that combines data and sampling parallelism in a single framework. This approach, Data Parallel State Space Decomposed RJMCMC (DP SSD-RJMCMC), can be adapted to different parallel cluster size, improve sampling efficiency and maintain parameter estimation accuracy. Formally, it forms a group of parallel chains by decomposing the state space into subsets of parameter space. Each subset has different but restricted dimensionality, and is assigned with an independent chain of variable length. To further improve load balancing, we also employ data decomposition, forming a task queue and conducting dynamic task allocation. The MPI-based implementation on a 32-node Beowulf cluster leads to significant speedup, typically of the order of 15-25 times, while maintaining the estimation accuracy.