Clustered Decoupled Software Pipelining on Commodity CMP

  • Authors:
  • Yuanming Zhang;Kanemitsu Ootsu;Takashi Yokota;Takanobu Baba

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the prevalence of chip multiprocessor (CMP) on server and client computers, it becomes an important issue to use the multicores to speedup existing sequential programs. Decoupled Software Pipelining (DSWP) is a recent proposed technique that extracts non-speculative threads from sequential programs for higher performance. However, this technique is not effective on commodity CMP architecture, because the inter-thread communication and synchronization overhead often offset the benefit from the parallelization. To reduce the overhead without modification to CMP architecture, this paper presents a clustered DSWP (CDSWP) technique that is an extension to DSWP. By communicating a dependent data set instead of a single dependent data, this technique transforms sequential program into a clustered thread pipeline. The meaning of "clustered" is that some dependent data items are clustered together as a communication unit. The advantage of this technique is that it can eliminate false sharing and reduce the average cache latency, and thus the overhead is reduced greatly. According to the preliminary experiments on some commodity CMP architectures, we have achieved loop speedup ranging from 16% to 58% on some SPEC2000 benchmark programs.