Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Programming parallel algorithms
Communications of the ACM
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Athapascan-1: On-Line Building Data Flow Graph in a Parallel Language
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
NESL: A Nested Data-Parallel Language (Version 2.6)
NESL: A Nested Data-Parallel Language (Version 2.6)
Compiling C for the EARTH Multithreaded Architecture
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Generic design of Chinese remaindering schemes
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Multi-GPU and multi-CPU parallelization for interactive physics simulations
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Hi-index | 0.00 |
Efficient execution of multithreaded iterative numerical computations requires to carefully take into account data dependencies. This paper presents an original way to express and schedule general dataflow multithreaded computations. We propose a distributed dataflow stack implementation which efficiently supports work stealing and achieves provable performances on heterogeneous grids. It exhibits properties such as non-blocking local stack accesses and generation at runtime of optimized one-sided data communications.