Deque-Free Work-Optimal Parallel STL Algorithms

Authors:
Daouda Traoré;Jean-Louis Roch;Nicolas Maillard;Thierry Gautier;Julien Bernard
Affiliations:
INRIA Moais research team, CNRS LIG lab., Grenoble University, France;INRIA Moais research team, CNRS LIG lab., Grenoble University, France;Instituto de Informática, Univ. Federal Rio Grande do Sul, Porto Alegre, Brazil;INRIA Moais research team, CNRS LIG lab., Grenoble University, France;INRIA Moais research team, CNRS LIG lab., Grenoble University, France
Venue:
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Year:
2008

Citing 11
Cited 7

Range partition adaptors: a mechanism for parallelizing STL

ACM SIGAPP Applied Computing Review
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Parallel Prefix Computation

Journal of the ACM (JACM)
STL tutorial and reference guide, second edition: C++ programming with the standard template library

STL tutorial and reference guide, second edition: C++ programming with the standard template library
An Adaptive Algorithm Selection Framework for Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors

Proceedings of the 2007 international workshop on Parallel symbolic computation
Adaptive loops with kaapi on multicore and grid: applications in symmetric cryptography

Proceedings of the 2007 international workshop on Parallel symbolic computation
Processor-Oblivious Parallel Stream Computations

PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Provably good multicore cache performance for divide-and-conquer algorithms

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Intel threading building blocks

Intel threading building blocks
MCSTL: the multi-core standard template library

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
LINBOX founding scope allocation, parallel building blocks, and separate compilation

ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

International Journal of High Performance Systems Architecture
A work stealing scheduler for parallel loops on shared cache multicores

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A generic parallel collection framework

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents provable work-optimal parallelizations of STL (Standard Template Library) algorithms based on the work-stealing technique. Unlike previous approaches where a deque for each processor is typically used to locally store ready tasks and where a processor that runs out of work steals a ready task from the deque of a randomly selected processor, the current paper instead presents an original implementation of work-stealing without using any deque but a distributed list in order to bound overhead for task creations. The paper contains both theoretical and experimental results bounding the work/running time.