Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
The parallel graph reduction machine, Alice
Proc. of a workshop on Graph reduction
The Balance Multiprocessor System
IEEE Micro
Improving locality of reference in a garbage-collecting memory management system
Communications of the ACM
An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Parallel graph reduction with the (v , G)-machine
FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
An abstract machine for parallel graph reduction
FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
Cache behavior of combinator graph reduction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Report on the programming language Haskell: a non-strict, purely functional language version 1.2
ACM SIGPLAN Notices - Haskell special issue
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Benchmarking implementations of lazy functional languages
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Message passing on the Meiko CS-2
Parallel Computing - Special issue: message passing interfaces
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Weak ordering—a new definition
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Memory Management for Parallel Tasks in Shared Memory
IWMM '92 Proceedings of the International Workshop on Memory Management
Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
An Implementation of Static Functional Process Networks
PARLE '92 Proceedings of the 4th International PARLE Conference on Parallel Architectures and Languages Europe
Localtiy and False Sharing in Coherent-Cache Parallel Graph Reduction
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Eliminating Invalidation in Coherent-Cache Parallel Graph Reduction
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
High-Performance parallel graph reduction
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume I: Parallel Architectures
Multiprocessor execution of functional programs
Multiprocessor execution of functional programs
A functional database (databases)
A functional database (databases)
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Derivation and performance of a pipelined transaction processor
SPDP '94 Proceedings of the 1994 6th IEEE Symposium on Parallel and Distributed Processing
Optimising Shared Reduction Variables in MPI Programs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
This paper presents the results of a simulation study of cache coherency issues in parallel implementations of functional programming languages. Parallel graph reduction uses a heap shared between processors for all synchronisation and communication. We show that a high degree of spatial locality is often present and that the rate of synchronisation is much greater than for imperative programs. We propose a modified coherency protocol with static cache line ownership and show that this allows locality to be exploited to at least the level of a conventional protocol, but without the unnecessary serialisation and network transactions this usually causes. The new protocol avoids false sharing, and makes it possible to reduce the number of messages exchanged, but relies on increasing the size of the cache lines exchanged to do so. It is, therefore, of most benefit with a high-bandwidth interconnection network with relatively high communication latencies or message handling overheads.