The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Serial combinators: “optimal” grains of parallelism
Proc. of a conference on Functional programming languages and computer architecture
Compile-time partitioning and scheduling of parallel programs
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
ORBIT: an optimizing compiler for scheme
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Annual review of computer science vol. 1, 1986
Managing stack frames in Smalltalk
SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
An assessment of multilisp: lessons from experience
International Journal of Parallel Programming
Preliminary results with the initial implementation of Qlisp
LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines
SIAM Journal on Computing
Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
Multiprocessor execution of functional programs
International Journal of Parallel Programming
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Connection Machine Lisp: fine-grained parallel symbolic processing
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Control of parallelism in the Manchester Dataflow Machine
Proceedings of the Functional Programming Languages and Computer Architecture
Low-Cost Process Creation and Dynamic Partitioning in Qlisp
Proceedings of the US/Japan Workshop on Parallel Lisp: Languages and Systems
Proceedings of the US/Japan Workshop on Parallel Lisp: Languages and Systems
Queue-based multi-processing LISP
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Orbit: an optimizing compiler for scheme
Orbit: an optimizing compiler for scheme
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Waiting algorithms for synchronization in large-scale multiprocessors
ACM Transactions on Computer Systems (TOCS)
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Leapfrogging: a portable technique for implementing efficient futures
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A hybrid execution model for fine-grained languages on distributed memory multicomputers
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Ordered multithreading: a novel technique for exploiting thread-level parallelism
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
GUM: a portable parallel implementation of Haskell
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
StackThreads/MP: integrating futures into calling standards
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Supporting dynamic data structures with Olden
Compiler optimizations for scalable parallel systems
A hierarchical load-balancing framework for dynamic multithreaded computations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Pthreads for dynamic and irregular parallelism
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Evaluating the performance limitations of MPMD communication
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Load balancing in a parallel graph reducer
Trends in functional programming
Designing Scalable Object Oriented Parallel Applications (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Run-Time System for Dynamic Grain Packing
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The Multi-architecture Performance of the Parallel Functional Language GP H (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A SCOOPP Evaluation on Packing Parallel Objects in Run-Time
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Performance Evaluation of OpenMP Applications with Nested Parallelism
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Comparing Parallel Functional Languages: Programming and Performance
Higher-Order and Symbolic Computation
Optimistic evaluation: an adaptive evaluation strategy for non-strict programs
ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
Parallel and Distributed Haskells
Journal of Functional Programming
Algorithm + strategy = parallelism
Journal of Functional Programming
EQUALS – a fast parallel implementation of a lazy language
Journal of Functional Programming
Transparent proxies for java futures
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Supporting exception handling for futures in Java
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Backtracking-based load balancing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
As-if-serial exception handling semantics for Java futures
Science of Computer Programming
Parallel performance tuning for Haskell
Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Experience with SC: transformation-based implementation of various extensions to C
Proceedings of the 2007 International Lisp Conference
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Efficient shared-memory support for parallel graph reduction
Future Generation Computer Systems
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A tutorial on parallel and concurrent programming in Haskell
AFP'08 Proceedings of the 6th international conference on Advanced functional programming
Hardware/software support for adaptive work-stealing in on-chip multiprocessor
Journal of Systems Architecture: the EUROMICRO Journal
Granularity-Aware Work-Stealing for Computationally-Uniform Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Seq no more: better strategies for parallel Haskell
Proceedings of the third ACM Haskell symposium on Haskell
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
AC: composable asynchronous IO for native languages
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Adaptive encoding of multimedia streams on MPSoC
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Tying memory management to parallel programming models
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
Lightweight lexical closures for legitimate execution stack access
CC'06 Proceedings of the 15th international conference on Compiler Construction
Haskell vs. f# vs. scala: a high-level language features and parallelism support comparison
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallelism granules aggregation with the T-system
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Adaptive granularity control in task parallel programs using multiversioning
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
When a parallel algorithm is written naturally, the resulting program often produces tasks of a finer grain than an implementation can exploit efficiently. Two solutions to the granularity problem that combine parallel tasks dynamically at runtime are discussed. The simpler load-based inlining method, in which tasks are combined based on dynamic bad level, is rejected in favor of the safer and more robust lazy task creation method, in which tasks are created only retroactively as processing results become available. The strategies grew out of work on Mul-T, an efficient parallel implementation of Scheme, but could be used with other languages as well. Mul-T implementations of lazy task creation are described for two contrasting machines, and performance statistics that show the method's effectiveness are presented. Lazy task creation is shown to allow efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.