GRIP—A high-performance architecture for parallel graph reduction
Proc. of a conference on Functional programming languages and computer architecture
Concurrent Prolog
Non-strict languages-programming and implementation
The Computer Journal - Special issue on Lazy functional programming
Partial evaluation of pattern matching in strings
Information Processing Letters
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
CTDNet-A Mechanism for the Concurrent Execution of Lambda Graphs
IEEE Transactions on Software Engineering
Static and dynamic semantics processing
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Design of the kernel language for the parallel inference machine
The Computer Journal - On concurrent logic programming
M-structures: extending a parallel, non-strict, functional language with state
Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Thread-based programming for the EM-4 hybrid dataflow machine
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Generating a compiler for a lazy language by partial evaluation
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Partial evaluation and automatic program generation
Partial evaluation and automatic program generation
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Design of cache memories for multi-threaded dataflow architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CTDNet III—an eager reduction model with laziness features
Abstract machine models for highly parallel computers
An introduction to partial evaluation
ACM Computing Surveys (CSUR)
Distributed partial evaluation
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
OMPI: optimizing MPI programs using partial evaluation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Partial Evaluation and Mixed Computation: Proceedings of the IFIP TC2 Workshop, Gammel Avernaes, Denmark, 18-24 Oct., 1987
Enhancing Functional and Irregular Parallelism: Stateful Functions and their Semantics
International Journal of Parallel Programming
List Processing with a Data Flow Machine
Proceedings of RIMS Symposium on Software Science and Engineering
Functional I-structure, and M-structure Implementations of NAS Benchmark FT
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Monads for Functional Programming
Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text
Design and performance evaluation of a multithreaded architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Performance Impacts of Caching I-Structure Data on Frame-Based Multithreaded Processing
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Extraction and Optimization of the Implicit Program Parallelism by Dynamic Partial Evaluation
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Exploiting Global Data Locality in Non-Blocking Multithreaded Architectures
ISPAN '97 Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks
I-Structure Software Cache: A Split-Phase Transaction Runtime Cache System
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Exploring the Parallelism Exposed by Partial Evaluation
Exploring the Parallelism Exposed by Partial Evaluation
Tuples and multiple return values in C++
Tuples and multiple return values in C++
Deriving a lazy abstract machine
Journal of Functional Programming
Exploiting single-assignment properties to optimize message-passing programs by code transformations
IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
Hi-index | 0.00 |
This paper surveys and demonstrates the power of non-strict evaluation in applications executed on distributed architectures. We present the design, implementation, and experimental evaluation of single assignment, incomplete data structures in a distributed memory architecture and Abstract Network Machine (ANM). Incremental Structures (IS), Incremental Structure Software Cache (ISSC), and Dynamic Incremental Structures (DIS) provide nonstrict data access and fully asynchronous operations that make them highly suited for the exploitation of fine-grain parallelism in distributed memory systems. We focus on split-phase memory operations and non-strict information processing under a distributed address space to improve the overall system performance. A novel technique of optimization at the communication level is proposed and described. We use partial evaluation of local and remote memory accesses not only to remove much of the excess overhead of message passing, but also to reduce the number of messages when some information about the input or part of the input is known. We show that split-phase transactions of IS, together with the ability of deferring reads, allow partial evaluation of distributed programs without losing determinacy. Our experimental evaluation indicates that commodity PC clusters with both IS and a caching mechanism, ISSC, are more robust. The system can deliver speedup for both regular and irregular applications. We also show that partial evaluation of memory accesses decreases the traffic in the interconnection network and improves the performance of MPI IS and MPI ISSC applications.