A bridging model for parallel computation
Communications of the ACM
The bulk-synchronous parallel random access machine
Theoretical Computer Science - Special issue on parallel computing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
BSPlib: The BSP programming library
Parallel Computing
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
The Paderborn University BSP (PUB) library
Parallel Computing
Parallel Scientific Computation: A Structured Approach Using BSP and MPI
Parallel Scientific Computation: A Structured Approach Using BSP and MPI
BSGP: bulk-synchronous GPU programming
ACM SIGGRAPH 2008 papers
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
A Bridging Model for Multi-core Computing
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Cache-Oblivious Sparse Matrix-Vector Multiplication by Using Sparse Matrix Partitioning Methods
SIAM Journal on Scientific Computing
Parallel hypergraph partitioning for scientific computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Special Issue: Compilers for Parallel Computing (CPC 2010)
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
We show that the bulk synchronous parallel (BSP) model, originally designed for distributed-memory systems, is also applicable for shared-memory multicore systems and, furthermore, that BSP libraries are useful in scientific computing on these systems. A proof-of-concept MulticoreBSP library has been implemented in Java, and is used to show that BSP algorithms can attain proper speedups on multicore architectures. This library is based on the BSPlib implementation, adapted to an object-oriented setting. In comparison, the number of function primitives is reduced, while the overall design simplicity is improved. We detail applying the BSP model and library on the sparse matrix–vector (SpMV) multiplication problem, and show by performing numerical experiments that the resulting BSP SpMV algorithm attains speedups, in one case reaching a speedup of 3.5 for 4 threads. Whereas not described in detail in this paper, algorithms for the fast Fourier transform and the dense LU decomposition are also investigated; in one case, attaining superlinear speedups of 5 for 4 threads. The predictability of BSP algorithms in the case of the SpMV is also investigated. Copyright © 2011 John Wiley & Sons, Ltd.