Communication assist for data driven multithreading

Authors:
Costas Kyriacou;Paraskevas Evripidou
Affiliations:
Computer Engineering Depart., Frederick Institute of Technology, Nicosia, Cyprus;Department of Computer Science, University of Cyprus, Nicosia, Cyprus
Venue:
PCI'01 Proceedings of the 8th Panhellenic conference on Informatics
Year:
2001

Citing 7
Cited 2

TAM—a compiler controlled threaded abstract machine

Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
A design study of the EARTH multiprocessor

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
D3-Machine: a decoupled data-driven multithreaded architecture with variable resolution support

Parallel Computing
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Network Interface for a Data Driven Network of Workstations (D2NOW)

ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
Telegraphos: High-Performance Networking for Parallel Processing on Workstation Clusters

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The StarT-Voyager Parallel System

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques

Data-Driven Multithreading Using Conventional Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Chip multiprocessor based on data-driven multithreading model

International Journal of High Performance Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latency tolerance is one of the main concerns in parallel processing. Data Driven Multithreading, a technique that uses extra hardware to schedule threads for execution based on data availability, allows for better performance, through latency tolerance. With Data Driven Multithreading a thread is scheduled for execution only if all of its inputs have been produced and placed in the processor's local memory. Communication and synchronization are decoupled from the computation portions of a program, i.e. they execute asynchronously. Thus, no synchronization or communication latencies will be experienced. The processor can, though be idle when there are no threads ready for execution, Thus, communication latencies are difficult to hide completely in applications with high communication-to-computation characteristics. This paper presents three mechanisms for the implementation of the communication assist of a Data Driven Multithreaded architecture. The first mechanism relies only on fine grain communication, where each packet can transfer a single value. With the second mechanism, the communication assist is modified to support block data communication through the same fine grain interconnection network of the first configuration. The third mechanism employs a broadcast network such as Ethernet to transfer blocks of data, while fine grain communication is handled the same way as with the other two mechanisms.