The promotion and accumulation strategies in transformational programming
ACM Transactions on Programming Languages and Systems (TOPLAS) - Lecture notes in computer science Vol. 174
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Deriving sequential and parallel programs from pure LISP specifications by program transformation
The IFIP TC2/WG 2.1 Working Conference on Program specification and transformation
GRIP—A high-performance architecture for parallel graph reduction
Proc. of a conference on Functional programming languages and computer architecture
The Balance Multiprocessor System
IEEE Micro
Parallel implementations of functional programming languages
The Computer Journal - Special issue on Lazy functional programming
Parallel graph reduction with the (v , G)-machine
FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
Cache behavior of combinator graph reduction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Report on the programming language Haskell: a non-strict, purely functional language version 1.2
ACM SIGPLAN Notices - Haskell special issue
Benchmarking implementations of lazy functional languages
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Message passing on the Meiko CS-2
Parallel Computing - Special issue: message passing interfaces
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
GUM: a portable parallel implementation of Haskell
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Efficient shared-memory support for parallel graph reduction
Future Generation Computer Systems
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A Transformation System for Developing Recursive Programs
Journal of the ACM (JACM)
A note on conditional expressions
Communications of the ACM
Functional Programming for Loosely-Coupled Multiprocessors
Functional Programming for Loosely-Coupled Multiprocessors
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
An Implementation of Static Functional Process Networks
PARLE '92 Proceedings of the 4th International PARLE Conference on Parallel Architectures and Languages Europe
Processing Transactions on GRIP, a Parallel Graph Reducer
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Parallel Programming Using Skeleton Functions
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Localtiy and False Sharing in Coherent-Cache Parallel Graph Reduction
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Eliminating Invalidation in Coherent-Cache Parallel Graph Reduction
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
M-Tree: A Parallel Abstract Data Type for Block-Irregular Adaptive Applictions
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Reactive Proxies: A Flexible Protocol Extension to Reduce ccNUMA Node Controller Contention
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Making a Packet: Cost-Effective Communication for a Parallel Graph Reducer
IFL '96 Selected Papers from the 8th International Workshop on Implementation of Functional Languages
Parallelising a Large Functional Program or: Keeping LOLITA Busy
IFL '97 Selected Papers from the 9th International Workshop on Implementation of Functional Languages
Engineering Large Parallel Functional Programs
IFL '97 Selected Papers from the 9th International Workshop on Implementation of Functional Languages
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance
Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance
Multiprocessor execution of functional programs
Multiprocessor execution of functional programs
A functional database (databases)
A functional database (databases)
Algorithm + strategy = parallelism
Journal of Functional Programming
Hi-index | 0.00 |
This paper is an exploration of the parallel graph reduction approach to parallel functional programming, illustrated by a particular example: pipelined, dynamically-scheduled implementation of search, updates and read-modify-write transactions on an in-store binary search tree. We use program transformation, execution-driven simulation and analytical modelling to expose the maximum potential parallelism, the minimum communication and synchronisation overheads, and to control the overall space requirement. We begin with a lazy functional program specifying a series of transactions on a binary tree, each involving several searches and updates, in a side-effect-free fashion. Transformation of the source code produces a formulation of the program with greater locality and larger grain size than can be achieved using naive parallelization methods, and we show that, with care, these tasks can be scheduled effectively. Even with a workload using random keys, significant spatial locality is found, and we evaluate a modified cache coherency protocol which avoids false sharing so that large cache lines can be used to minimise the number of messages required. As expected with a pipeline, the application should reach a steady state as soon as the first transaction is completed. However, if the network latency is too large, the rate of completion lags behind the rate at which work is admitted, and internal queues grow without bound. We determine the conditions under which this occurs, and show how it can be avoided while maximising speedup.