Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
Design of cache memories for multi-threaded dataflow architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hi-index | 0.00 |
For fine-grain computation to be effective, the cost of communications between the large number of subtasks should be minimized. In this paper, we present an optimization technique which reduces overheads of communications between local subtasks by bypassing the network interface and transferring data directly from memory or registers to memory. On average, the optimization results in 35.6% improvement in total execution time on instruction-level simulations with six benchmark programs from 1 to 32 nodes.