A bridging model for parallel computation
Communications of the ACM
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The hierarchical task graph as a universal intermediate representation
International Journal of Parallel Programming
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Quantifying Locality Effect in Data Access Delay: Memory logP
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Parallel algorithms for Bayesian phylogenetic inference
Journal of Parallel and Distributed Computing - High-performance computational biology
A case study in top-down performance estimation for a large-scale parallel application
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A Parallel Computational Model for Heterogeneous Clusters
IEEE Transactions on Parallel and Distributed Systems
PBPI: a high performance implementation of Bayesian phylogenetic inference
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Journal of Signal Processing Systems
Exploring Multi-Grained Parallelism in Compute-Intensive DEVS Simulations
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.00 |
Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized "accelerator" cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications on a multi-processor based on the IBM Cell Broadband Engine.