Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The effect of context switches on cache performance
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Global analysis for partitioning non-strict programs into sequential threads
LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Optimistic active messages: a mechanism for scheduling communication with computation
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Multithreaded Implementation of Id using P-RISC Graphs
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Analyzing the benefits of a separate processor to handle messages for fine-grain multithreading
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Distributed Shared Abstractions (DSA) on Multiprocessors
IEEE Transactions on Software Engineering
pHluid: the design of a parallel functional language implementation on workstations
Proceedings of the first ACM SIGPLAN international conference on Functional programming
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of Various Node Configurations for Fine-grain Multithreading on Stock Processors
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Hi-index | 0.00 |
A major challenge in fine-grained computing is achieving locality without excessive scheduling overhead. We built two J-Machine implementations of a fine-grained programming model, the Berkeley Threaded Abstract Machine. One implementation takes an Active Messages approach, maintaining a scheduling hierarchy in software in order to improve data cache performance. Another approach relies on the J-Machine's message queues and fast task switch, lowering the control costs at the expense of data locality. Our analysis measures the costs and benefits of each approach, for a variety of programs and cache configurations. The Active Messages implementation is strongest when miss penalties are high and for the finest-grained programs. The hardware-buffered implementation is strongest in direct-mapped caches, where it achieves substantially better instruction cache performance.