GRIP—A high-performance architecture for parallel graph reduction
Proc. of a conference on Functional programming languages and computer architecture
Algorithmic skeletons: structured management of parallel computation
Algorithmic skeletons: structured management of parallel computation
STAR/MPI: binding a parallel library to interactive symbolic algebra systems
ISSAC '95 Proceedings of the 1995 international symposium on Symbolic and algebraic computation
Concurrent programming in ERLANG (2nd ed.)
Concurrent programming in ERLANG (2nd ed.)
GUM: a portable parallel implementation of Haskell
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A case study of multi-threaded Gröbner basis completion
ISSAC '96 Proceedings of the 1996 international symposium on Symbolic and algebraic computation
Maple on a massively parallel, distributed memory machine
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Multithreaded programming with Pthreads
Multithreaded programming with Pthreads
Parallel computer algebra (tutorial)
ISSAC '97 Proceedings of the 1997 international symposium on Symbolic and algebraic computation
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Research Directions in Parallel Functional Programming
Research Directions in Parallel Functional Programming
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Eden - The Paradise of Functional Concurrent Programming
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
The nofib Benchmark Suite of Haskell Programs
Proceedings of the 1992 Glasgow Workshop on Functional Programming
Language support for lightweight transactions
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Algorithm + strategy = parallelism
Journal of Functional Programming
Composable memory transactions
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel functional programming in Eden
Journal of Functional Programming
Automatic thread distribution for nested parallelism in OpenMP
Proceedings of the 19th annual international conference on Supercomputing
Haskell on a shared-memory multiprocessor
Proceedings of the 2005 ACM SIGPLAN workshop on Haskell
Software and the Concurrency Revolution
Queue - Multiprocessors
The Atomos transactional programming language
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Compiler and runtime support for efficient software transactional memory
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
What do high-level memory models mean for transactions?
Proceedings of the 2006 workshop on Memory system performance and correctness
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Manticore: a heterogeneous parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Feedback directed implicit parallelism
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Google's MapReduce programming model – Revisited
Science of Computer Programming
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Revisiting the Sequential Programming Model for Multi-Core
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Evaluating a High-Level Parallel Language (GpH) for Computational GRIDs
IEEE Transactions on Parallel and Distributed Systems
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Hierarchical master-worker skeletons
PADL'08 Proceedings of the 10th international conference on Practical aspects of declarative languages
Building an interface between eden and maple: a way of parallelizing computer algebra algorithms
IFL'03 Proceedings of the 15th international conference on Implementation of Functional Languages
Higher-Order and Symbolic Computation
Reliable scalable symbolic computation: the design of SymGridPar2
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
With the emergence of commodity multicore architectures, exploiting tightly-coupled parallelism has become increasingly important. Functional programming languages, such as Haskell, are, in principle, well placed to take advantage of this trend, offering the ability to easily identify large amounts of fine-grained parallelism. Unfortunately, obtaining real performance benefits has often proved hard to realise in practice. This paper reports on a new approach using middleware that has been constructed using the Eden parallel dialect of Haskell. Our approach is ``low pain'' in the sense that the programmer constructs a parallel program by inserting a small number of higher-order algorithmic skeletons at key points in the program. It is ``high gain'' in the sense that we are able to get good parallel speedups. Our approach is unusual in that we do not attempt to use shared memory directly, but rather coordinate parallel computations using a message-passing implementation. This approach has a number of advantages. Firstly, coordination, i.e. locking and communication, is both confined to limited shared memory areas, essentially the communication buffers, and is also isolated within well-understood libraries. Secondly, the coarse thread granularity that we obtain reduces coordination overheads, so locks are normally needed only on (relatively large) messages, and not on individual data items, as is often the case for simple shared-memory implementations. Finally, cache coherency requirements are reduced since individual tasks do not share caches, and can garbage collect independently. We report results for two representative computational algebra problems. Computational algebra is a challenging application area that has not been widely studied in the general parallelism community. Computational algebra applications have high computational demands, and are, in principle, often suitable for parallel execution, but usually display a high degree of irregularity in terms of both task and data structure. This makes it difficult to construct parallel applications that perform well in practice. Using our system, we are able to obtain both extremely good processor utilisation (97%) and very good absolute speedups (up to 7.7) on an eight-core machine.