Simple generational garbage collection and fast allocation
Software—Practice & Experience
Vector models for data-parallel computing
Vector models for data-parallel computing
Compiling with continuations
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
A compile-time granularity analysis algorithm and its performance evaluation
Selected papers of international conference on Fifth generation computer systems 92
Ropes: an alternative to strings
Software—Practice & Experience
Programming parallel algorithms
Communications of the ACM
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The Definition of Standard ML
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Algorithm + strategy = parallelism
Journal of Functional Programming
Journal of Functional Programming
Finger trees: a simple general-purpose data structure
Journal of Functional Programming
Whole-program compilation in MLton
Proceedings of the 2006 workshop on ML
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Manticore: a heterogeneous parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Clowns to the left of me, jokers to the right (pearl): dissecting data structures
Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Implicitly-threaded parallelism in Manticore
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
On the granularity of divide-and-conquer parallelism
FP'95 Proceedings of the 1995 international conference on Functional Programming
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Multicore garbage collection with local heaps
Proceedings of the international symposium on Memory management
Oracle scheduling: controlling granularity in implicitly parallel languages
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Proceedings of the 2012 Haskell Symposium
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Data-only flattening for nested data parallelism
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.