Hiding communication latency and coherence overhead in software DSMs

  • Authors:
  • R. Bianchini;L. I. Kontothanassis;R. Pinto;M. De Maria;M. Abud;C. L. Amorim

  • Affiliations:
  • COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil 21945-970;Department of computer Science, University of Rochester, Rochester, New York;COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil 21945-970;COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil 21945-970;COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil 21945-970;COPPE Systems Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil 21945-970

  • Venue:
  • Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead tolerance: a) moving basic communication and coherence tasks away from computation processors; b) prefetching of diffs; and c) generating and applying diffs with hardware assistance. We evaluate the isolated and combined impact of these features on the performance of TreadMarks. We also compare performance against two versions of the Shrimp-based AURC protocol. Using detailed execution-driven simulations of a 16-node network of workstations, we show that the greatest performance benefits provided by our protocol controller come from our hardware-supported diffs. Reducing the burden of communication and coherence transactions on the computation processor is also beneficial but to a smaller extent. Prefetching is not always profitable. Our results show that our protocol controller can improve running time performance by up to 50% for TreadMarks, which means that it can double the TreadMarks speedups. The overlapping implementation of TreadMarks performs as well or better than AURC for 5 of our 6 applications. We conclude that the simple hardware support we propose allows for the implementation of high-performance software DSMs at low cost. Based on this conclusion, we are building the NCP2 parallel system at COPPE/UFRJ.