Lambda lifting: transforming programs to recursive equations
Proc. of a conference on Functional programming languages and computer architecture
A methodology for implementing highly concurrent data objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A new solution of Dijkstra's concurrent programming problem
Communications of the ACM
Accurate garbage collection in an uncooperative environment
Proceedings of the 3rd international symposium on Memory management
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
ERLANG for Concurrent Programming
ERLANG for Concurrent Programming
Shared Memory Multiprocessor Support for SAC
IFL '98 Selected Papers from the 10th International Workshop on 10th International Workshop
Parallel functional programming in Eden
Journal of Functional Programming
Computer
DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Runtime support for multicore Haskell
Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Lightweight asynchrony using parasitic threads
Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Memory models: a case for rethinking parallel languages and hardware
Communications of the ACM
Optimizations in a private nursery-based garbage collector
Proceedings of the 2010 international symposium on Memory management
Parallelization libraries: Characterizing and reducing overheads
ACM Transactions on Architecture and Code Optimization (TACO)
The Garbage Collection Handbook: The Art of Automatic Memory Management
The Garbage Collection Handbook: The Art of Automatic Memory Management
Introducing the PilGRIM: a processor for executing lazy functional languages
IFL'10 Proceedings of the 22nd international conference on Implementation and application of functional languages
DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Evaluation of a Connectionless NoC for a Real-Time Distributed Shared Memory Many-Core System
DSD '12 Proceedings of the 2012 15th Euromicro Conference on Digital System Design
Atomic-free irregular computations on GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Portable Memory Consistency for Software Managed Distributed Memory in Many-Core SoC
IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Hi-index | 0.00 |
It is hard to reason about the state of a multicore system-on-chip, because operations on memory need multiple cycles to complete, since cores communicate via an interconnect like a network-on-chip. To simplify programming, atomicity is required, by means of atomic read-modify-write (RMW) operations, a strong memory model, and hardware cache coherency. As a result, multicore architectures are very complex, but this stems from the fact that they are designed with an imperative programming paradigm in mind, i.e. based on threads that communicate via shared memory. In this paper, we show the impact on a multicore architecture, when the programming paradigm is changed and a λ-calculus-based (functional) language is used instead. Ordering requirements of memory operations are more relaxed and synchronization is simplified, because λ-calculus does not have a notion of state or memory, and therefore does not impose ordering requirements on the platform. We implemented a functional language for multicores with a weak memory model, without the need of hardware cache coherency, any atomic RMW operation, or mutex---the execution is atomic-free. Experiments show that even on a system with (transparently applied) software cache coherency, execution scales properly up to 32 cores. This shows that concurrent hardware complexity can be reduced by making different choices in the software layers on top.