Evaluating the locality benefits of active messages

Authors:
Ellen Spertus;William J. Dally
Affiliations:
Microsoft Research, 1 Microsoft Way, Redmond, WA and Laboratory for Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts;Laboratory for Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts
Venue:
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1995

Citing 11
Cited 4

Architecture of a message-driven processor

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The effect of context switches on cache performance

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Global analysis for partitioning non-strict programs into sequential threads

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
TAM—a compiler controlled threaded abstract machine

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Optimistic active messages: a mechanism for scheduling communication with computation

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms

IEEE Micro
A Multithreaded Implementation of Id using P-RISC Graphs

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Analyzing the benefits of a separate processor to handle messages for fine-grain multithreading

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing

Distributed Shared Abstractions (DSA) on Multiprocessors

IEEE Transactions on Software Engineering
pHluid: the design of a parallel functional language implementation on workstations

Proceedings of the first ACM SIGPLAN international conference on Functional programming
Polling watchdog: combining polling and interrupts for efficient message handling

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of Various Node Configurations for Fine-grain Multithreading on Stock Processors

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major challenge in fine-grained computing is achieving locality without excessive scheduling overhead. We built two J-Machine implementations of a fine-grained programming model, the Berkeley Threaded Abstract Machine. One implementation takes an Active Messages approach, maintaining a scheduling hierarchy in software in order to improve data cache performance. Another approach relies on the J-Machine's message queues and fast task switch, lowering the control costs at the expense of data locality. Our analysis measures the costs and benefits of each approach, for a variety of programs and cache configurations. The Active Messages implementation is strongest when miss penalties are high and for the finest-grained programs. The hardware-buffered implementation is strongest in direct-mapped caches, where it achieves substantially better instruction cache performance.