On the limits of GPU acceleration
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Can traditional programming bridge the Ninja performance gap for parallel computing applications?
Proceedings of the 39th Annual International Symposium on Computer Architecture
All-pairs computations on many-core graphics processors
Parallel Computing
An (almost) direct deployment of the Fast Multipole Method on the Cell processor
The Journal of Supercomputing
High level transforms for SIMD and low-level computer vision algorithms
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
We present an inter-architectural comparison of single-and double-precision direct n-body implementations on modern multicore platforms, including those based on the Intel Nehalem and AMD Barcelona systems, the Sony-Toshiba-IBM PowerXCell/8i processor, and NVIDA Tesla C870 and C1060 GPU systems. We compare our implementations across platforms on a variety of proxy measures, including performance, coding complexity, and energy efficiency.