De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
International Journal of High Performance Computing Applications
Introduction to the cell broadband engine architecture
IBM Journal of Research and Development
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Journal of Computational Physics
Multi-thread implementations of the lattice Boltzmann method on non-uniform grids for CPUs and GPUs
Computers & Mathematics with Applications
Hi-index | 0.00 |
A parallel Lattice Boltzmann Method (pLBM), which is based on hierarchical spatial decomposition, is designed to perform large-scale flow simulations. The algorithm uses critical section-free, dual representation in order to expose maximal concurrency and data locality. Performances of emerging multi-core platforms--PlayStation3 (Cell Broadband Engine) and Compute Unified Device Architecture (CUDA)--are tested using the pLBM, which is implemented with multi-thread and message-passing programming. The results show that pLBM achieves good performance improvement, 11.02 for Cell over a traditional Xeon cluster and 8.76 for CUDA graphics processing unit (GPU) over a Sempron central processing unit (CPU). The results provide some insights into application design on future many-core platforms.