An introduction to parallel algorithms
An introduction to parallel algorithms
IEEE Transactions on Parallel and Distributed Systems
Bayesian Networks for Data Mining
Data Mining and Knowledge Discovery
A parallel Lauritzen-Spiegelhalter algorithm for probabilistic inference
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Fast Parallel Algorithm for Finding the kth Longest Path in A Tree*
APDC '97 Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97)
Dynamic, Competitive Scheduling of Multiple DAGs in a Distributed Heterogeneous Environment
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Bayesian model learning based on a parallel MCMC strategy
Statistics and Computing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel exact inference on the cell broadband engine processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Algorithm-system scalability of heterogeneous computing
Journal of Parallel and Distributed Computing
Efficient Breadth-First Search on the Cell/BE Processor
IEEE Transactions on Parallel and Distributed Systems
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
Logarithmic time parallel Bayesian inference
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
We present the design and implementation of a parallel exact inference algorithm on the Cell Broadband Engine (Cell BE) processor, a heterogeneous multicore architecture. Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically with the network structure and clique size. In this paper, we exploit parallelism in exact inference at multiple levels. We propose a rerooting method to minimize the critical path for exact inference, and an efficient scheduler to dynamically allocate SPEs. In addition, we explore potential table representation and layout to optimize DMA transfer between local store and main memory. We implemented the proposed method and conducted experiments on the Cell BE processor in the IBM QS20 Blade. We achieved speedup up to 10 x on the Cell, compared to state-of-the-art processors. The methodology proposed in this paper can be used for online scheduling of directed acyclic graph (DAG) structured computations.