Non-strict execution in parallel and distributed computing

Authors:
Alfredo Cristobal-Salas;Andrei Tchernykh;Jean-Luc Gaudiot;Wen-Yen Lin
Affiliations:
CICESE Research Center, Ensenada, BC, Mexico;CICESE Research Center, Ensenada, BC, Mexico;UCI Parallel Systems & Computer Architectures Lab, Department of Electrical and Computer Engineering, University of California, Irvine, California;TIA Mobile, Inc., Los Angeles, California
Venue:
International Journal of Parallel Programming
Year:
2003

Citing 34
Cited 1

GRIP—A high-performance architecture for parallel graph reduction

Proc. of a conference on Functional programming languages and computer architecture
Guarded horn clauses

Concurrent Prolog
Non-strict languages-programming and implementation

The Computer Journal - Special issue on Lazy functional programming
Partial evaluation of pattern matching in strings

Information Processing Letters
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
CTDNet-A Mechanism for the Concurrent Execution of Lambda Graphs

IEEE Transactions on Software Engineering
Static and dynamic semantics processing

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Design of the kernel language for the parallel inference machine

The Computer Journal - On concurrent logic programming
M-structures: extending a parallel, non-strict, functional language with state

Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Generating a compiler for a lazy language by partial evaluation

POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial evaluation and automatic program generation

Partial evaluation and automatic program generation
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Design of cache memories for multi-threaded dataflow architecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CTDNet III—an eager reduction model with laziness features

Abstract machine models for highly parallel computers
An introduction to partial evaluation

ACM Computing Surveys (CSUR)
Distributed partial evaluation

PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
OMPI: optimizing MPI programs using partial evaluation

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Partial Evaluation and Mixed Computation: Proceedings of the IFIP TC2 Workshop, Gammel Avernaes, Denmark, 18-24 Oct., 1987

Partial Evaluation and Mixed Computation: Proceedings of the IFIP TC2 Workshop, Gammel Avernaes, Denmark, 18-24 Oct., 1987
Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions

Cluster Computing
Enhancing Functional and Irregular Parallelism: Stateful Functions and their Semantics

International Journal of Parallel Programming
List Processing with a Data Flow Machine

Proceedings of RIMS Symposium on Software Science and Engineering
Functional I-structure, and M-structure Implementations of NAS Benchmark FT

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Monads for Functional Programming

Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text
Design and performance evaluation of a multithreaded architecture

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Performance Impacts of Caching I-Structure Data on Frame-Based Multithreaded Processing

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Extraction and Optimization of the Implicit Program Parallelism by Dynamic Partial Evaluation

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Exploiting Global Data Locality in Non-Blocking Multithreaded Architectures

ISPAN '97 Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks
I-Structure Software Cache: A Split-Phase Transaction Runtime Cache System

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Exploring the Parallelism Exposed by Partial Evaluation

Exploring the Parallelism Exposed by Partial Evaluation
Tuples and multiple return values in C++

Tuples and multiple return values in C++
Deriving a lazy abstract machine

Journal of Functional Programming

Exploiting single-assignment properties to optimize message-passing programs by code transformations

IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper surveys and demonstrates the power of non-strict evaluation in applications executed on distributed architectures. We present the design, implementation, and experimental evaluation of single assignment, incomplete data structures in a distributed memory architecture and Abstract Network Machine (ANM). Incremental Structures (IS), Incremental Structure Software Cache (ISSC), and Dynamic Incremental Structures (DIS) provide nonstrict data access and fully asynchronous operations that make them highly suited for the exploitation of fine-grain parallelism in distributed memory systems. We focus on split-phase memory operations and non-strict information processing under a distributed address space to improve the overall system performance. A novel technique of optimization at the communication level is proposed and described. We use partial evaluation of local and remote memory accesses not only to remove much of the excess overhead of message passing, but also to reduce the number of messages when some information about the input or part of the input is known. We show that split-phase transactions of IS, together with the ability of deferring reads, allow partial evaluation of distributed programs without losing determinacy. Our experimental evaluation indicates that commodity PC clusters with both IS and a caching mechanism, ISSC, are more robust. The system can deliver speedup for both regular and irregular applications. We also show that partial evaluation of memory accesses decreases the traffic in the interconnection network and improves the performance of MPI IS and MPI ISSC applications.