Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
Parallel skeletons for structured composition
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
On Dividing and Conquering Independently
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
MALLBA: A Library of Skeletons for Combinatorial Optimisation (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Efficient Parallel Programming with Algorithmic Skeletons
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Task and data parallelism in P3L
Patterns and skeletons for parallel and distributed computing
A library of constructive skeletons for sequential style of parallel programming
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A parallel symmetric block-tridiagonal divide-and-conquer algorithm
ACM Transactions on Mathematical Software (TOMS)
Flexible skeletal programming with eskel
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Algorithmic skeletons for multi-core, multi-GPU systems and clusters
International Journal of High Performance Computing and Networking
Hi-index | 0.00 |
Algorithmic skeletons intend to simplify parallel programming by providing recurring forms of program structure as predefined components. We present a fully distributed task parallel skeleton for a very general class of divide and conquer algorithms for MIMD machines with distributed memory. This approach is compared to a simple master-worker design. Based on experimental results for different example applications such as Mergesort, the Karatsuba multiplication algorithm and Strassen's algorithm for matrix multiplication, we show that the distributed workpool enables good runtimes and in particular scalability. Moreover, we discuss some implementation aspects for the distributed skeleton, such as the underlying data structures and load balancing strategy, in detail.