Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

Authors:
Frederico Pratas;Pedro Trancoso;Leonel Sousa;Alexandros Stamatakis;Guochun Shi;Volodymyr Kindratenko
Affiliations:
SiPS, INESC-ID/IST Universidade Técnica de Lisboa Rua Alves Redol 9, 1000-029 Lisbon, Portugal;CASPER, Department of Computer Science, University of Cyprus, P.O. Box 20537, CY 1678 Nicosia, Cyprus;SiPS, INESC-ID/IST Universidade Técnica de Lisboa Rua Alves Redol 9, 1000-029 Lisbon, Portugal;The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Scholss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany;National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, 1205 West Clark Street, Urbana, IL 61801, USA;National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, 1205 West Clark Street, Urbana, IL 61801, USA
Venue:
Parallel Computing
Year:
2012

Citing 27
Cited 2

NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hardware acceleration for spatial selections and joins

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fast computation of database operations using graphics processors

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees

Bioinformatics
GPGPU: general purpose computation on graphics hardware

ACM SIGGRAPH 2004 Course Notes
GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
PBPI: a high performance implementation of Bayesian phylogenetic inference

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Executing stream joins on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Introduction to the cell broadband engine architecture

IBM Journal of Research and Development
Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Adapting a message-driven parallel application to GPU-accelerated clusters

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Exploiting Fine-Grained Parallelism in the Phylogenetic Likelihood Function with MPI, Pthreads, and OpenMP: A Performance Study

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Programming the Cell Broadband Engine Architecture: Examples and Best Practices

Programming the Cell Broadband Engine Architecture: Examples and Best Practices
Data parallel acceleration of decision support queries using Cell/BE and GPUs

Proceedings of the 6th ACM conference on Computing frontiers
Many-core algorithms for statistical phylogenetics

Bioinformatics
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accelerating SQL database operations on a GPU with CUDA

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Iterative induced dipoles computation for molecular mechanics on GPUs

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accuracy and performance of single versus double precision arithmetics for maximum likelihood phylogeny reconstruction

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
Application Acceleration with the Cell Broadband Engine

Computing in Science and Engineering

Parallel partitioning for distributed systems using sequential assignment

Journal of Parallel and Distributed Computing
Graphics Processing Units and Open Computing Language for parallel computing

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Currently, we are facing a situation where applications exhibit increasing computational demands and where a large variety of parallel processor systems are available. In this paper we focus on exploiting fine-grain parallelism for three applications with distinct characteristics: a Bioinformatics application (MrBayes), a Molecular Dynamics application (NAMD), and a database application (TPC-H). We assess, side-by-side, the performance of the three applications on general-purpose multi-core processors, the Cell Broadband Engine (Cell/BE), and Graphics Processing Units (GPU). Our results indicate that application performance depends on the characteristics of the parallel architectures and on the computational requirements of the core functions of the respective applications. For MrBayes the best overall performance is achieved on general-purpose multi-core processors, for NAMD on the Cell/BE, and for TPC-H on GPUs.