Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE

Authors:
Filip Blagojevic;Xizhou Feng;Kirk W. Cameron;Dimitrios S. Nikolopoulos
Affiliations:
Center for High-End Computing Systems, Department of Computer Science, Virginia Tech;Center for High-End Computing Systems, Department of Computer Science, Virginia Tech;Center for High-End Computing Systems, Department of Computer Science, Virginia Tech;Center for High-End Computing Systems, Department of Computer Science, Virginia Tech
Venue:
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Year:
2008

Citing 11
Cited 8

A bridging model for parallel computation

Communications of the ACM
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The hierarchical task graph as a universal intermediate representation

International Journal of Parallel Programming
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Quantifying Locality Effect in Data Access Delay: Memory logP

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Parallel algorithms for Bayesian phylogenetic inference

Journal of Parallel and Distributed Computing - High-performance computational biology
A case study in top-down performance estimation for a large-scale parallel application

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A Parallel Computational Model for Heterogeneous Clusters

IEEE Transactions on Parallel and Distributed Systems
PBPI: a high performance implementation of Bayesian phylogenetic inference

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming

Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling dynamic parallelism on accelerators

Proceedings of the 6th ACM conference on Computing frontiers
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Modeling and Evaluating Non-shared Memory CELL/BE Type Multi-core Architectures for Local Image and Video Processing

Journal of Signal Processing Systems
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations

PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Optimizing explicit data transfers for data parallel applications on the cell architecture

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Multicore acceleration of Discrete Event System Specification systems

Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized "accelerator" cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.