Efficient shared-memory support for parallel graph reduction

Authors:
Andrew J. Bennett;Paul H. J. Kelly
Affiliations:
-;-
Venue:
Future Generation Computer Systems
Year:
1997

Citing 29
Cited 1

Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
The parallel graph reduction machine, Alice

Proc. of a workshop on Graph reduction
The Balance Multiprocessor System

IEEE Micro
Improving locality of reference in a garbage-collecting memory management system

Communications of the ACM
An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
A Survey of Cache Coherence Schemes for Multiprocessors

Computer
Parallel graph reduction with the (v , G)-machine

FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
An abstract machine for parallel graph reduction

FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
Cache behavior of combinator graph reduction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Report on the programming language Haskell: a non-strict, purely functional language version 1.2

ACM SIGPLAN Notices - Haskell special issue
Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
Benchmarking implementations of lazy functional languages

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Message passing on the Meiko CS-2

Parallel Computing - Special issue: message passing interfaces
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Memory Management for Parallel Tasks in Shared Memory

IWMM '92 Proceedings of the International Workshop on Memory Management
Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
An Implementation of Static Functional Process Networks

PARLE '92 Proceedings of the 4th International PARLE Conference on Parallel Architectures and Languages Europe
Localtiy and False Sharing in Coherent-Cache Parallel Graph Reduction

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Eliminating Invalidation in Coherent-Cache Parallel Graph Reduction

PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
High-Performance parallel graph reduction

PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume I: Parallel Architectures
Multiprocessor execution of functional programs

Multiprocessor execution of functional programs
A functional database (databases)

A functional database (databases)
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Derivation and performance of a pipelined transaction processor

SPDP '94 Proceedings of the 1994 6th IEEE Symposium on Parallel and Distributed Processing

Optimising Shared Reduction Variables in MPI Programs

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the results of a simulation study of cache coherency issues in parallel implementations of functional programming languages. Parallel graph reduction uses a heap shared between processors for all synchronisation and communication. We show that a high degree of spatial locality is often present and that the rate of synchronisation is much greater than for imperative programs. We propose a modified coherency protocol with static cache line ownership and show that this allows locality to be exploited to at least the level of a conventional protocol, but without the unnecessary serialisation and network transactions this usually causes. The new protocol avoids false sharing, and makes it possible to reduce the number of messages exchanged, but relies on increasing the size of the cache lines exchanged to do so. It is, therefore, of most benefit with a high-bandwidth interconnection network with relatively high communication latencies or message handling overheads.