Performance scalability of decoupled software pipelining

  • Authors:
  • Ram Rangan;Neil Vachharajani;Guilherme Ottoni;David I. August

  • Affiliations:
  • IBM Austin Research Laboratory, Austin, TX;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ;Princeton University, Princeton, NJ

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Any successful solution to using multicore processors to scale general-purpose program performance will have to contend with rising intercore communication costs while exposing coarse-grained parallelism. Recently proposed pipelined multithreading (PMT) techniques have been demonstrated to have general-purpose applicability and are also able to effectively tolerate inter-core latencies through pipelined interthread communication. These desirable properties make PMT techniques strong candidates for program parallelization on current and future multicore processors and understanding their performance characteristics is critical to their deployment. To that end, this paper evaluates the performance scalability of a general-purpose PMT technique called decoupled software pipelining (DSWP) and presents a thorough analysis of the communication bottlenecks that must be overcome for optimal DSWP scalability.