Feedback directed implicit parallelism

Authors:
Tim Harris;Satnam Singh
Affiliations:
Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom
Venue:
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Year:
2007

Citing 9
Cited 11

Speculative computation in multilisp

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Implementation of non-strict functional programming languages

Implementation of non-strict functional programming languages
Concurrent Haskell

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic adaptive pre-tenuring

Proceedings of the 2nd international symposium on Memory management
Implicit parallel programming in pH

Implicit parallel programming in pH
Overview of the Monsoon Project

ICCD '91 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
Optimistic evaluation: an adaptive evaluation strategy for non-strict programs

ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
Algorithm + strategy = parallelism

Journal of Functional Programming
Haskell on a shared-memory multiprocessor

Proceedings of the 2005 ACM SIGPLAN workshop on Haskell

Quasi-static scheduling for safe futures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Low-pain, high-gain multicore programming in Haskell: coordinating irregular symbolic computations on multicore architectures

Proceedings of the 4th workshop on Declarative aspects of multicore programming
How much parallelism is there in irregular applications?

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Profiling Java programs for parallelism

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Runtime support for multicore Haskell

Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Compress-and-conquer for optimal multicore computing

Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Seq no more: better strategies for parallel Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Commutative set: a language extension for implicit parallel programming

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
InContext: simple parallelism for distributed applications

Proceedings of the 20th international symposium on High performance distributed computing
Formally specifying and analyzing a parallel virtual machine for lazy functional languages using Maude

Proceedings of the fifth international workshop on High-level parallel programming and applications
Parallel and concurrent programming in Haskell

CEFP'11 Proceedings of the 4th Summer School conference on Central European Functional Programming School

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising sources of parallelism, (iii) recompile the program with this work being performed speculatively via a work-stealing system and then (iv) to detect at run-time any attempt to perform operations that would reveal the presence of speculation. We assess the practicality of the approach through an implementation based on GHC 6.6 along with a limit study based on the execution profiles we gathered. We support the full Concurrent Haskell language compiled with traditional optimizations and including I/O operations and synchronization as well as pure computation. We use 20 of the larger programs from the 'nofib' benchmark suite. The limit study shows that programs vary a lot in the parallelism we can identify: some have none, 16 have a potential 2x speed-up, 4 have 32x. In practice, on a 4-core processor, we get 10-80% speed-ups on 7 programs. This is mainly achieved at the addition of a second core rather than beyond this. This approach is therefore not a replacement for manual parallelization, but rather a way of squeezing extra performance out of the threads of an already-parallel program or out of a program that has not yet been parallelized.