Executing functional programs on a virtual tree of processors

Authors:
F. Warren Burton;M. Ronan Sleep
Affiliations:
Computing Studies Sector, University of East Anglia, Norwich NR4 7TJ, England;Computing Studies Sector, University of East Anglia, Norwich NR4 7TJ, England
Venue:
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Year:
1981

Citing 9
Cited 47

Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

Communications of the ACM
A Calculus of Communicating Systems

A Calculus of Communicating Systems
Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory

Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory
Microprocessor and Its Application

Microprocessor and Its Application
The semantic elegance of applicative languages

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Copying operands versus copying results: A solution to the problem of large operands in FFP'S

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
A lazy evaluator

POPL '76 Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages
Notes on Shuffle/Exchange-Type Switching Networks

IEEE Transactions on Computers
The Indirect Binary n-Cube Microprocessor Array

IEEE Transactions on Computers

Transputers + virtual tree kernel = real speedups

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Compile-Time Scheduling and Assignment of Data-Flow Program Graphs with Data-Dependent Iteration

IEEE Transactions on Computers
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Load-sharing in heterogeneous systems via weighted factoring

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Guaranteeing Good Memory Bounds for Parallel Programs

IEEE Transactions on Software Engineering
Space-efficient scheduling of parallelism with synchronization variables

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-efficient implementation of nested parallelism

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism

Journal of the ACM (JACM)
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Space Efficient Execution of Deterministic Parallel Programs

IEEE Transactions on Software Engineering
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Storage Management in Virtual Tree Machines

IEEE Transactions on Computers
Load balancing in a parallel graph reducer

Trends in functional programming
A New Scheduling Algorithm for General Strict Multithreaded Computations

Proceedings of the 13th International Symposium on Distributed Computing
Expressions as processes

LFP '82 Proceedings of the 1982 ACM symposium on LISP and functional programming
ULT: a Java threads model for platform independent execution

ACM SIGOPS Operating Systems Review
Adaptive work stealing with parallelism feedback

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Virtual Tree Machines

IEEE Transactions on Computers
The Performance of Multimicrocomputer Networks Supporting Dynamic Workloads

IEEE Transactions on Computers
Optimal speedup on a low-degree multi-core parallel architecture (LoPRAM)

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Paper: Divide-and-Conquer and parallel graph reduction

Parallel Computing
Provably efficient two-level adaptive scheduling

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Towards a high-level implementation of execution primitives for unrestricted, independent and-parallelism

PADL'08 Proceedings of the 10th international conference on Practical aspects of declarative languages
Brief announcement: serial-parallel reciprocity in dynamic multithreaded languages

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Granularity-Aware Work-Stealing for Computationally-Uniform Grids

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Scalable hardware support for conditional parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using memory mapping to support cactus stacks in work-stealing runtime systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lazy tree splitting

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Resource oblivious sorting on multicores

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Perfect sampling of load sharing policies in large scale distributed systems

ASMTA'10 Proceedings of the 17th international conference on Analytical and stochastic modeling techniques and applications
Space profiling for parallel functional programs

Journal of Functional Programming
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

International Journal of High Performance Systems Architecture
Deterministic parallel random-number generation for dynamic-multithreading platforms

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
A localized tracing scheme applied to garbage collection

APLAS'06 Proceedings of the 4th Asian conference on Programming Languages and Systems
Revisiting the cache miss analysis of multithreaded algorithms

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Using load information in work-stealing on distributed systems with non-uniform communication latencies

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
On-the-fly pipeline parallelism

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
How to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributed systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Well-structured futures and cache locality

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

A wide variety of computational models, including the lambda calculus, may be represented by a set of reduction rules which guide the (run-time) construction of a process tree. Even a single source of parallelism in an otherwise lazy evaluator may give rise to an exponential growth in the process tree, which must eventually overwhelm any finite architecture. We present a simple model for concurrently executing such process trees, which gives us a basis for matching the production of new tasks to the available resources. In addition, we present a generalised interpretation of a familiar topology suited to the support of large, perhaps irregular, virtual process trees on a much smaller physical network.