Data-Driven Multithreading Using Conventional Microprocessors

  • Authors:
  • Costas Kyriacou;Paraskevas Evripidou;Pedro Trancoso

  • Affiliations:
  • -;IEEE;IEEE

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the Data-Driven Multithreading (DDM) model and how it may be implemented using off-the-shelf microprocessors. Data-Driven Multithreading is a nonblocking multithreading execution model that tolerates internode latency by scheduling threads for execution based on data availability. Scheduling based on data availability can be used to exploit cache management policies that reduce significantly cache misses. Such policies include firing a thread for execution only if its data is already placed in the cache. We call this cache management policy the CacheFlow policy. The core of the DDM implementation presented is a memory mapped hardware module that is attached directly to the processor's bus. This module is responsible for thread scheduling and is known as the Thread Synchronization Unit (TSU). The evaluation of DDM was performed using simulation of the Data-Driven Network of Workstations ({\rm{D}}^2{\rm{NOW}}). {\rm{D}}^2{\rm{NOW}} is a DDM implementation built out of regular workstations augmented with the TSU. The simulation was performed for nine scientific applications, seven of which belong to the SPLASH-2 suite. The results show that DDM can tolerate well both the communication and synchronization latency. Overall, for 16 and 32-node {\rm{D}}^2{\rm{NOW}} machines the speedup observed was 14.4 and 26.0, respectively.