The manticore project

  • Authors:
  • Matthew Fluet

  • Affiliations:
  • Rochester Institute of Technology, Rochester, NY, USA

  • Venue:
  • Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Manticore project is a research effort to design and implement a parallel functional programming language that targets commodity multicore and shared-memory multiprocessors. Our language is a dialect of Standard ML, called Parallel ML (PML), that starts with a strict, mutation-free functional core and extends with both implicitly-threaded constructs for fine-grain parallelism and CML-style explicit concurrency for coarse-grain parallelism. We have a prototype implementation that demonstrates both reasonable sequential performance and good scalability on both 32-core Intel machines and 48-core AMD machines. Our past research contributions include: a parallel implementation of CML; a novel infrastructure for nested schedulers; a collection of expressive implicitly-threaded parallel constructs with mostly sequential semantics; a Lazy Tree Splitting (LTS) strategy for performance-robust work-stealing of parallel computations over irregular tree-like data structures. In this talk, I will motivate and describe the high-points in both the design of the Parallel ML language and the implementation of the Manticore compiler and runtime system. After briefly discussing some notable results among our past research contributions, I will highlight our most recent research efforts. In one line of work, we have demonstrated the importance of treating even commodity desktops and servers as non-uniform memory access (NUMA) machines. This is particularly important for the scalability of parallel garbage collection, where unbalanced work with lower memory traffic is often better than balanced work with high memory traffic. In another line of work, we have explored data-only flattening, a compilation strategy for nested data parallelism the eschews the traditional vectorization approach which transforms both control and data and was designed for wide-vector SIMD architectures. Instead, data-only flattening transforms nested data structures, but leaves control structures intact, a strategy that is better suited to multicore architectures. Finally, we are exploring language features that provide controlled forms of (deterministic and nondeterministic) mutable state within parallel computations. We begin with the observation that there are parallel stateful algorithms that exhibit significantly better performance than the corresponding parallel algorithm without mutable state. To support such algorithms, we extend Manticore two with memoziation of pure functions using a high-performance implementation of a dynamically sized, parallel hash table to provide scalable performance. We are also exploring various execution models for general mutable state, with the crucial design criteria that all executions should preserve the ability to reason locally about the behavior of code.