Range partition adaptors: a mechanism for parallelizing STL
ACM SIGAPP Applied Computing Review
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Journal of the ACM (JACM)
STL tutorial and reference guide, second edition: C++ programming with the standard template library
STL tutorial and reference guide, second edition: C++ programming with the standard template library
An Adaptive Algorithm Selection Framework for Reduction Parallelization
IEEE Transactions on Parallel and Distributed Systems
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors
Proceedings of the 2007 international workshop on Parallel symbolic computation
Adaptive loops with kaapi on multicore and grid: applications in symmetric cryptography
Proceedings of the 2007 international workshop on Parallel symbolic computation
Processor-Oblivious Parallel Stream Computations
PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Intel threading building blocks
Intel threading building blocks
MCSTL: the multi-core standard template library
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
LINBOX founding scope allocation, parallel building blocks, and separate compilation
ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface
International Journal of High Performance Systems Architecture
A work stealing scheduler for parallel loops on shared cache multicores
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A generic parallel collection framework
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
A new programming paradigm for GPGPU
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
This paper presents provable work-optimal parallelizations of STL (Standard Template Library) algorithms based on the work-stealing technique. Unlike previous approaches where a deque for each processor is typically used to locally store ready tasks and where a processor that runs out of work steals a ready task from the deque of a randomly selected processor, the current paper instead presents an original implementation of work-stealing without using any deque but a distributed list in order to bound overhead for task creations. The paper contains both theoretical and experimental results bounding the work/running time.