A bridging model for parallel computation
Communications of the ACM
Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Structured development of parallel programs
Structured development of parallel programs
Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI: The Complete Reference
Research Directions in Parallel Functional Programming
Research Directions in Parallel Functional Programming
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Parallel Programming Using Skeleton Functions
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Patterns and skeletons for parallel and distributed computing
Patterns and skeletons for parallel and distributed computing
Models of parallel computation: a survey and synthesis
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
A library of constructive skeletons for sequential style of parallel programming
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A Parallel Computational Model for Heterogeneous Clusters
IEEE Transactions on Parallel and Distributed Systems
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Early experiments with the OpenMP/MPI hybrid programming model
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Skandium: Multi-core Programming with Algorithmic Skeletons
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Flexible skeletal programming with eskel
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Two fundamental concepts in skeletal parallel programming
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
High performance architectures are increasingly heterogeneous with shared and distributed memory components. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This paper proposes a new architectural cost model that accounts for cache size and improves on heterogeneous architectures, and demonstrates a skeleton-based programming model that simplifies programming heterogeneous architectures. We further demonstrate that the cost model can be exploited by skeletons to improve load balancing on heterogeneous architectures. The heterogeneous skeleton model facilitates performance portability, using the architectural cost model to automatically balance load across heterogeneous components of the architecture. For both a data parallel benchmark, and realistic image processing program we obtain good performance for the heterogeneous skeleton on homogeneous shared and distributed memory architectures, and on three heterogeneous architectures. We also show that taking cache size into account in the model leads to improved balance and performance.