Specialization of CML message-passing primitives
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Manticore: a heterogeneous parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Implicitly-threaded parallelism in Manticore
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Programming in Manticore, a heterogenous parallel functional language
CEFP'09 Proceedings of the Third summer school conference on Central European functional programming school
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Garbage collection for multicore NUMA machines
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Effective scheduling techniques for high-level parallel programming languages
Effective scheduling techniques for high-level parallel programming languages
Implementation techniques for nested-data-parallel languages
Implementation techniques for nested-data-parallel languages
Journal of Functional Programming
Data-only flattening for nested data parallelism
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
The Manticore project is a research effort to design and implement a parallel functional programming language that targets commodity multicore and shared-memory multiprocessors. Our language is a dialect of Standard ML, called Parallel ML (PML), that starts with a strict, mutation-free functional core and extends with both implicitly-threaded constructs for fine-grain parallelism and CML-style explicit concurrency for coarse-grain parallelism. We have a prototype implementation that demonstrates both reasonable sequential performance and good scalability on both 32-core Intel machines and 48-core AMD machines. Our past research contributions include: a parallel implementation of CML; a novel infrastructure for nested schedulers; a collection of expressive implicitly-threaded parallel constructs with mostly sequential semantics; a Lazy Tree Splitting (LTS) strategy for performance-robust work-stealing of parallel computations over irregular tree-like data structures. In this talk, I will motivate and describe the high-points in both the design of the Parallel ML language and the implementation of the Manticore compiler and runtime system. After briefly discussing some notable results among our past research contributions, I will highlight our most recent research efforts. In one line of work, we have demonstrated the importance of treating even commodity desktops and servers as non-uniform memory access (NUMA) machines. This is particularly important for the scalability of parallel garbage collection, where unbalanced work with lower memory traffic is often better than balanced work with high memory traffic. In another line of work, we have explored data-only flattening, a compilation strategy for nested data parallelism the eschews the traditional vectorization approach which transforms both control and data and was designed for wide-vector SIMD architectures. Instead, data-only flattening transforms nested data structures, but leaves control structures intact, a strategy that is better suited to multicore architectures. Finally, we are exploring language features that provide controlled forms of (deterministic and nondeterministic) mutable state within parallel computations. We begin with the observation that there are parallel stateful algorithms that exhibit significantly better performance than the corresponding parallel algorithm without mutable state. To support such algorithms, we extend Manticore two with memoziation of pure functions using a high-performance implementation of a dynamically sized, parallel hash table to provide scalable performance. We are also exploring various execution models for general mutable state, with the crucial design criteria that all executions should preserve the ability to reason locally about the behavior of code.