Reducing the Communication Overhead of Dynamic Applications on Shared Memory Multiprocessors

  • Authors:
  • A. Sivasubramaniam

  • Affiliations:
  • -

  • Venue:
  • HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Shared memory machines offer the convenience of a shared address space. This makes them particularly appealing for applications with dynamic communication behavior since the mechanisms for data transfer between processors is hidden from the programmer. But the scalability of these machines is often limited by the latencies incurred in accessing locations in remote memories. Caches alleviate this problem by exploiting the temporal and spatial locality in an application. However, the induced traffic for maintaining coherence can have a large impact on limiting performance. Invalidation-based protocols for coherence maintenance are conservative and always resort to receiver-initiated communication. Thus the receiver may have to experience the entire latency of the data transfer even though the data item may have been available much earlier. Update-based schemes, though sender-initiated, can incur high write overheads by sending redundant updates to processors that may not need them. The goal of this research is to reduce the read and write latencies of applications with dynamic communication behavior by employing intelligent sender-initiated data transfer mechanisms. In the process, we would like to keep our demands from the programmer, the compiler, and the hardware as low as possible. Towards this goal, we present a set of write primitives that lower the communication overhead for shared memory accesses governed by locks. We demonstrate the performance benefits of these primitives using a database application drawn from the Geographical Information Systems (GIS) domain. We explore the competitive update mechanism for the remaining shared memory accesses. Using a set of applications, we examine the amount of history that we need to maintain for an effective competitive update scheme. We also show how this effective scheme can be implemented in software on emerging shared memory architectures with little hardware support.