An (almost) direct deployment of the Fast Multipole Method on the Cell processor

Authors:
Pierre Fortin;Jean-Luc Lamotte
Affiliations:
LIP6, UPMC Univ. Paris 06 and CNRS UMR 7606, Paris cedex 05, France 75252;LIP6, UPMC Univ. Paris 06 and CNRS UMR 7606, Paris cedex 05, France 75252
Venue:
The Journal of Supercomputing
Year:
2013

Citing 18
Cited 0

A fast adaptive multipole algorithm in three dimensions

Journal of Computational Physics
Guest Editors' Introduction: The Top 10 Algorithms

Computing in Science and Engineering
Compilation for explicitly managed memory hierarchies

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid MPI-Thread Parallelization of the Fast Multipole Method

ISPDC '07 Proceedings of the Sixth International Symposium on Parallel and Distributed Computing
High performance BLAS formulation of the multipole-to-local operator in the fast multipole method

Journal of Computational Physics
Fast multipole methods on graphics processors

Journal of Computational Physics
369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization

IEEE Transactions on Parallel and Distributed Systems
Programming the Linpack benchmark for the IBM PowerXCell 8i processor

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

Parallel Computing
A massively parallel adaptive fast-multipole method on heterogeneous architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Direct N-body Kernels for Multicore Platforms

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
Scalable fast multipole methods on distributed heterogeneous architectures

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hierarchical N-body Simulations with Autotuning for Heterogeneous Systems

Computing in Science and Engineering
High performance BLAS formulation of the adaptive Fast Multipole Method

Mathematical and Computer Modelling: An International Journal
Scalable Distributed Fast Multipole Methods

HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the first deployment of the Fast Multipole Method on the Cell processor (PowerXCell 8i). We rely on the matrix formulation with BLAS routines of the FMB code (Fast Multipole with BLAS) in order to directly and efficiently offload the most time consuming operators of both far field and near field computations on the Cell heterogeneous cores. We detail the difficulties that had to be solved first, and we finally obtain a deployment in single and double precisions, which scales linearly on several Cell blades and which is able to handle both uniform and non-uniform distributions of particles. We also present our performance results and comparisons with multicore CPUs, as well as the limitations of our deployment on the Cell processor.