Low-pain, high-gain multicore programming in Haskell: coordinating irregular symbolic computations on multicore architectures

  • Authors:
  • Abdallah Deeb I. Al Zain;Kevin Hammond;Jost Berthold;Phil Trinder;Greg Michaelson;Mustafa Aswad

  • Affiliations:
  • Heriot-Watt University, Edinburgh, United Kingdom;University of St Andrews, St Andrews, United Kingdom;Philipps-Universitat, Marburg, Germany;Heriot-Watt University, Edinburgh, United Kingdom;Heriot-Watt University, Edinburgh, United Kingdom;Heriot-Watt University, Edinburgh, United Kingdom

  • Venue:
  • Proceedings of the 4th workshop on Declarative aspects of multicore programming
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the emergence of commodity multicore architectures, exploiting tightly-coupled parallelism has become increasingly important. Functional programming languages, such as Haskell, are, in principle, well placed to take advantage of this trend, offering the ability to easily identify large amounts of fine-grained parallelism. Unfortunately, obtaining real performance benefits has often proved hard to realise in practice. This paper reports on a new approach using middleware that has been constructed using the Eden parallel dialect of Haskell. Our approach is ``low pain'' in the sense that the programmer constructs a parallel program by inserting a small number of higher-order algorithmic skeletons at key points in the program. It is ``high gain'' in the sense that we are able to get good parallel speedups. Our approach is unusual in that we do not attempt to use shared memory directly, but rather coordinate parallel computations using a message-passing implementation. This approach has a number of advantages. Firstly, coordination, i.e. locking and communication, is both confined to limited shared memory areas, essentially the communication buffers, and is also isolated within well-understood libraries. Secondly, the coarse thread granularity that we obtain reduces coordination overheads, so locks are normally needed only on (relatively large) messages, and not on individual data items, as is often the case for simple shared-memory implementations. Finally, cache coherency requirements are reduced since individual tasks do not share caches, and can garbage collect independently. We report results for two representative computational algebra problems. Computational algebra is a challenging application area that has not been widely studied in the general parallelism community. Computational algebra applications have high computational demands, and are, in principle, often suitable for parallel execution, but usually display a high degree of irregularity in terms of both task and data structure. This makes it difficult to construct parallel applications that perform well in practice. Using our system, we are able to obtain both extremely good processor utilisation (97%) and very good absolute speedups (up to 7.7) on an eight-core machine.