All-pairs computations on many-core graphics processors

Authors:
Abhinav Sarje;Srinivas Aluru
Affiliations:
Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA
Venue:
Parallel Computing
Year:
2013

Citing 14
Cited 0

Parallel many-body simulations without all-to-all communication

Journal of Parallel and Distributed Computing
Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development
A non-linear dimension reduction methodology for generating data-driven stochastic input models

Journal of Computational Physics
Pairwise Distance Matrix Computation for Multiple Sequence Alignment on the Cell Broadband Engine

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Parallel accelerated cartesian expansions for particle dynamics simulations

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Compute Pairwise Manhattan Distance and Pearson Correlation Coefficient of Data Points with GPU

SNPD '09 Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing
Exploiting the capabilities of modern GPUs for dense matrix computations

Concurrency and Computation: Practice & Experience
Constructing Gene Regulatory Networks on Clusters of Cell Processors

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Direct N-body Kernels for Multicore Platforms

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Parallel information theory based construction of gene regulatory networks

HiPC'08 Proceedings of the 15th international conference on High performance computing
Parallel Information-Theory-Based Construction of Genome-Wide Gene Regulatory Networks

IEEE Transactions on Parallel and Distributed Systems
Accelerating Pairwise Computations on Cell Processors

IEEE Transactions on Parallel and Distributed Systems
Applications on emerging paradigms in parallel computing

Applications on emerging paradigms in parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing high-performance applications on emerging multi- and many-core architectures requires efficient mapping techniques and architecture-specific tuning methodologies to realize performance closer to their peak compute capability and memory bandwidth. In this paper, we develop architecture-aware methods to accelerate all-pairs computations on many-core graphics processors. Pairwise computations occur frequently in numerous application areas in scientific computing. While they appear easy to parallelize due to the independence of computing each pairwise interaction from all others, development of techniques to address multi-layered memory hierarchies, mapping within the restrictions imposed by the small and low-latency on-chip memories, striking the right balanced between concurrency, reuse and memory traffic etc., are crucial to obtain high-performance. We present a hierarchical decomposition scheme for GPUs based on decomposition of the output matrix and input data. We demonstrate that a careful tuning of the involved set of decomposition parameters is essential to achieve high efficiency on the GPUs. We also compare the performance of our strategies with an implementation on the STI Cell processor as well as multi-core CPU parallelizations using OpenMP and Intel Threading Building Blocks.