A fast adaptive multipole algorithm in three dimensions
Journal of Computational Physics
Guest Editors' Introduction: The Top 10 Algorithms
Computing in Science and Engineering
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid MPI-Thread Parallelization of the Fast Multipole Method
ISPDC '07 Proceedings of the Sixth International Symposium on Parallel and Distributed Computing
High performance BLAS formulation of the multipole-to-local operator in the fast multipole method
Journal of Computational Physics
Fast multipole methods on graphics processors
Journal of Computational Physics
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization
IEEE Transactions on Parallel and Distributed Systems
Programming the Linpack benchmark for the IBM PowerXCell 8i processor
Scientific Programming - High Performance Computing with the Cell Broadband Engine
A massively parallel adaptive fast-multipole method on heterogeneous architectures
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Direct N-body Kernels for Multicore Platforms
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Scalable fast multipole methods on distributed heterogeneous architectures
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems
Computing in Science and Engineering
High performance BLAS formulation of the adaptive Fast Multipole Method
Mathematical and Computer Modelling: An International Journal
Scalable Distributed Fast Multipole Methods
HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
Hi-index | 0.00 |
This paper presents the first deployment of the Fast Multipole Method on the Cell processor (PowerXCell 8i). We rely on the matrix formulation with BLAS routines of the FMB code (Fast Multipole with BLAS) in order to directly and efficiently offload the most time consuming operators of both far field and near field computations on the Cell heterogeneous cores. We detail the difficulties that had to be solved first, and we finally obtain a deployment in single and double precisions, which scales linearly on several Cell blades and which is able to handle both uniform and non-uniform distributions of particles. We also present our performance results and comparisons with multicore CPUs, as well as the limitations of our deployment on the Cell processor.