Programming a Multicore Architecture without Coherency and Atomic Operations

Authors:
Jochem H. Rutgers;Marco J. G. Bekooij;Gerard J. M. Smit
Affiliations:
University of Twente, Department of EEMCS, P.O. Box 217, 7500 AE Enschede, The Netherlands;University of Twente, Department of EEMCS, P.O. Box 217, 7500 AE Enschede, The Netherlands;University of Twente, Department of EEMCS, P.O. Box 217, 7500 AE Enschede, The Netherlands
Venue:
Proceedings of Programming Models and Applications on Multicores and Manycores
Year:
2014

Citing 25
Cited 0

Lambda lifting: transforming programs to recursive equations

Proc. of a conference on Functional programming languages and computer architecture
A methodology for implementing highly concurrent data objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Concurrent Haskell

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A new solution of Dijkstra's concurrent programming problem

Communications of the ACM
Accurate garbage collection in an uncooperative environment

Proceedings of the 3rd international symposium on Memory management
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
ERLANG for Concurrent Programming

ERLANG for Concurrent Programming
Shared Memory Multiprocessor Support for SAC

IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
Parallel functional programming in Eden

Journal of Functional Programming
The Problem with Threads

Computer
On Cache Coherency and Memory Consistency Issues in NoC Based Shared Memory Multiprocessor SoC Architectures

DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
Data parallel Haskell: a status report

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Runtime support for multicore Haskell

Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Lightweight asynchrony using parasitic threads

Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Memory models: a case for rethinking parallel languages and hardware

Communications of the ACM
Optimizations in a private nursery-based garbage collector

Proceedings of the 2010 international symposium on Memory management
Parallelization libraries: Characterizing and reducing overheads

ACM Transactions on Architecture and Code Optimization (TACO)
The Garbage Collection Handbook: The Art of Automatic Memory Management

The Garbage Collection Handbook: The Art of Automatic Memory Management
Introducing the PilGRIM: a processor for executing lazy functional languages

IFL'10 Proceedings of the 22nd international conference on Implementation and application of functional languages
DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Evaluation of a Connectionless NoC for a Real-Time Distributed Shared Memory Many-Core System

DSD '12 Proceedings of the 2012 15th Euromicro Conference on Digital System Design
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Portable Memory Consistency for Software Managed Distributed Memory in Many-Core SoC

IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Avoiding Locks and Atomic Instructions in Shared-Memory Parallel BFS Using Optimistic Parallelization

IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory. In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a λ-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because λ-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex---the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.