DIB—a distributed implementation of backtracking
ACM Transactions on Programming Languages and Systems (TOPLAS)
Control of parallelism in the Manchester Dataflow Machine
Proc. of a conference on Functional programming languages and computer architecture
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Communication complexity for parallel divide-and-conquer
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
An atomic model for message-passing
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Randomized parallel algorithms for backtrack search and branch-and-bound computation
Journal of the ACM (JACM)
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Synchronized MIMD computing
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
The cilk system for parallel multithreaded computing
The cilk system for parallel multithreaded computing
Guaranteeing Good Memory Bounds for Parallel Programs
IEEE Transactions on Software Engineering
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Efficient detection of determinacy races in Cilk programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-efficient scheduling of parallelism with synchronization variables
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-Efficient Scheduling of Multithreaded Computations
SIAM Journal on Computing
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computation-centric memory models
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Detecting data races in Cilk programs that use locks
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Storage Management in Virtual Tree Machines
IEEE Transactions on Computers
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Dag-Consistent Distributed Shared Memory
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
The Performance of Work Stealing in Multiprogrammed Environments
The Performance of Work Stealing in Multiprogrammed Environments
Cilk: efficient multithreaded computing
Cilk: efficient multithreaded computing
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Parallel interval-Newton using message passing: dynamic load balancing strategies
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
On bounding time and space for multiprocessor garbage collection
ACM SIGPLAN Notices - Best of PLDI 1979-1999
On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Load balancing and locality in range-queriable data structures
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Dynamic circular work-stealing deque
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Value-maximizing deadline scheduling and its application to animation rendering
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Parallel depth first vs. work stealing schedulers on CMP architectures
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque
Distributed Computing - Special issue: DISC 04
Programming with exceptions in JCilk
Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
MCSTL: the multi-core standard template library
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Deadlock-free scheduling of X10 computations with bounded resources
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Manticore: a heterogeneous parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Parallel garbage collection for shared memory multiprocessors
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Proceedings of the 21st annual international conference on Supercomputing
Multithreaded programming in Cilk
Proceedings of the 2007 international workshop on Parallel symbolic computation
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Automated dynamic redistribution of virtual operating systems under the Xen virtual machine monitor
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
WSPE: a peer-to-peer programming environment for grid-unaware applications
Proceedings of the 5th international workshop on Middleware for grid computing: held at the ACM/IFIP/USENIX 8th International Middleware Conference
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Space profiling for parallel functional programs
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Load balancing using work-stealing for pipeline parallelism in emerging applications
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 36th annual international symposium on Computer architecture
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Multicore Scheduling for Lightweight Communicating Processes
COORDINATION '09 Proceedings of the 11th International Conference on Coordination Models and Languages
Reducers and other Cilk++ hyperobjects
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Brief announcement: a lower bound for depth-restricted work stealing
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Developing, simulating, and deploying peer-to-peer systems using the Kompics component model
Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
HPPNetSim: a parallel simulation of large-scale interconnection networks
SpringSim '09 Proceedings of the 2009 Spring Simulation Multiconference
The design of a task parallel library
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Helper locks for fork-join parallel programming
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Performance Evaluation of Work Stealing for Streaming Applications
OPODIS '09 Proceedings of the 13th International Conference on Principles of Distributed Systems
Distributed Scheduling of Parallel Hybrid Computations
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
A dynamic-sized nonblocking work stealing deque
A dynamic-sized nonblocking work stealing deque
Lightweight asynchrony using parasitic threads
Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
The Cilk++ concurrency platform
The Journal of Supercomputing
Defining and controlling the heterogeneity of a cluster: The Wrekavoc tool
Journal of Systems and Software
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Load balancing: toward the infinite network and beyond
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Parallelization of bulk operations for STL dictionaries
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: serial-parallel reciprocity in dynamic multithreaded languages
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Exploiting multicore systems with Cilk
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Parallel operations of sparse polynomials on multicores: I. multiplication and Poisson bracket
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Granularity-Aware Work-Stealing for Computationally-Uniform Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Using memory mapping to support cactus stacks in work-stealing runtime systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Resource oblivious sorting on multicores
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Space-efficient scheduling of stochastically generated tasks
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Perfect sampling of load sharing policies in large scale distributed systems
ASMTA'10 Proceedings of the 17th international conference on Analytical and stochastic modeling techniques and applications
Area-maximizing schedules for series-parallel DAGs
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Data structures in the multicore age
Communications of the ACM
Task management for irregular-parallel workloads on the GPU
Proceedings of the Conference on High Performance Graphics
Programming in Manticore, a heterogenous parallel functional language
CEFP'09 Proceedings of the Third summer school conference on Central European functional programming school
Efficient data race detection for async-finish parallelism
RV'10 Proceedings of the First international conference on Runtime verification
Affinity driven distributed scheduling algorithm for parallel computations
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Parallelization libraries: Characterizing and reducing overheads
ACM Transactions on Architecture and Code Optimization (TACO)
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Space profiling for parallel functional programs
Journal of Functional Programming
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Work-stealing for mixed-mode parallelism by deterministic team-building
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Performance driven distributed scheduling of parallel hybrid computations
Theoretical Computer Science
Performance driven multi-objective distributed scheduling for parallel computations
ACM SIGOPS Operating Systems Review
Work stealing for multi-core HPC clusters
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Globally parallel, locally sequential: a preliminary proposal for Acumen objects
Proceedings of the 9th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing
Combining RTSJ with Fork/Join: a priority-based model
Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems
Adaptive runtime selection of parallel schedules in the polytope model
Proceedings of the 19th High Performance Computing Symposia
A parallel programming model for ada
SIGAda '11 Proceedings of the 2011 ACM annual international conference on Special interest group on the ada programming language
SIAM Journal on Scientific Computing
Exploring the limits of GPGPU scheduling in control flow bound applications
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Space-efficient scheduling of stochastically generated tasks
Information and Computation
Time complexity of distributed topological self-stabilization: the case of graph linearization
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Deterministic parallel random-number generation for dynamic-multithreading platforms
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A work-stealing scheduler for X10's task parallelism with suspension
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Multicore scheduling for lightweight communicating processes
Science of Computer Programming
Chapter 14: building search computing applications
Search Computing
An Intel Cilk plus based task tree executor architecture
SEPADS'12/EDUCATION'12 Proceedings of the 11th WSEAS international conference on Software Engineering, Parallel and Distributed Systems, and proceedings of the 9th WSEAS international conference on Engineering Education
A performance model for X10 applications: what's going on under the hood?
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
DAG3: a tool for design and analysis of applications for multicore architectures
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Revisiting the cache miss analysis of multithreaded algorithms
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
WSCOM: Online Task Scheduling with Data Transfers
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Server-based scheduling of parallel real-time tasks
Proceedings of the tenth ACM international conference on Embedded software
How to achieve scalable fork/join on many-core architectures?
Proceedings of the 3rd annual conference on Systems, programming, and applications: software for humanity
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
High throughput software for direct numerical simulations of compressible two-phase flows
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MCSTL: the multi-core standard template library
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Compiler support for lightweight context switching
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Dynamic distributed scheduling algorithm for state space search
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A new programming paradigm for GPGPU
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Tutorial: multicore programming using divide-and-conquer and work stealing
Proceedings of the 2012 ACM conference on High integrity language technology
Synchronization cannot be implemented as a library
Proceedings of the 2012 ACM conference on High integrity language technology
Efficient data race detection for async-finish parallelism
Formal Methods in System Design
Betweenness centrality: algorithms and implementations
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Correct and efficient work-stealing for weak memory models
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Work-stealing with configurable scheduling strategies
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Message-passing concurrency for scalable, stateful, reconfigurable middleware
Proceedings of the 13th International Middleware Conference
Hardware support for fine-grained event-driven computation in Anton 2
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Design and implementation of a customizable work stealing scheduler
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Program-centric cost models for locality
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Arbiter work stealing for parallelizing games on heterogeneous computing environments
Proceedings of the High Performance Computing Symposium
Using simulation to explore distributed key-value stores for extreme-scale system services
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Load-balanced pipeline parallelism
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallel flow-sensitive pointer analysis by graph-rewriting
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Load balancing non-uniform parallel computations
Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
Energy-efficient work-stealing language runtimes
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably good scheduling for parallel programs that use data structures through implicit batching
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Specification and Verification of Concurrent Programs Through Refinements
Journal of Automated Reasoning
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines
Future Generation Computer Systems
Friendly barriers: efficient work-stealing with return barriers
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Boosting CUDA Applications with CPU---GPU Hybrid Computing
International Journal of Parallel Programming
Hi-index | 0.05 |
This paper studies the problem of efficiently schedulling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is “work stealing,” in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies.Specifically, our analysis shows that the expected time to execute a fully strict computation on P processors using our work-stealing scheduler is T1/P + O(T ∞ , where T1 is the minimum serial execution time of the multithreaded computation and (T ∞ is the minimum execution time with an infinite number of processors. Moreover, the space required by the execution is at most S1P, where S1 is the minimum serial space requirement. We also show that the expected total communication of the algorithm is at most O(PT ∞ ( 1 + nd)Smax), where Smax is the size of the largest activation record of any thread and nd is the maximum number of times that any thread synchronizes with its parent. This communication bound justifies the folk wisdom that work-stealing schedulers are more communication efficient than their work-sharing counterparts. All three of these bounds are existentially optimal to within a constant factor.