T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A design study of the EARTH multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Control of loop parallelism in multithreaded code
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Design of storage hierarchy in multithreaded architectures
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Speculative Data-Driven Multithreading
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
(R) The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Data Prefetching and Data Forwarding in Shared Memory Multiprocessors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Communication assist for data driven multithreading
PCI'01 Proceedings of the 8th Panhellenic conference on Informatics
A case for chip multiprocessors based on the data-driven multithreading model
International Journal of Parallel Programming
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A stream-computing extension to OpenMP
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
DDM-VMc: the data-driven multithreading virtual machine for the cell processor
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Resource-agnostic programming for many-core microgrids
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Hardware budget and runtime system for data-driven multithreaded chip multiprocessor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Producer-Consumer: the programming model for future many-core processors
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
This paper describes the Data-Driven Multithreading (DDM) model and how it may be implemented using off-the-shelf microprocessors. Data-Driven Multithreading is a nonblocking multithreading execution model that tolerates internode latency by scheduling threads for execution based on data availability. Scheduling based on data availability can be used to exploit cache management policies that reduce significantly cache misses. Such policies include firing a thread for execution only if its data is already placed in the cache. We call this cache management policy the CacheFlow policy. The core of the DDM implementation presented is a memory mapped hardware module that is attached directly to the processor's bus. This module is responsible for thread scheduling and is known as the Thread Synchronization Unit (TSU). The evaluation of DDM was performed using simulation of the Data-Driven Network of Workstations ({\rm{D}}^2{\rm{NOW}}). {\rm{D}}^2{\rm{NOW}} is a DDM implementation built out of regular workstations augmented with the TSU. The simulation was performed for nine scientific applications, seven of which belong to the SPLASH-2 suite. The results show that DDM can tolerate well both the communication and synchronization latency. Overall, for 16 and 32-node {\rm{D}}^2{\rm{NOW}} machines the speedup observed was 14.4 and 26.0, respectively.