Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

  • Authors:
  • Angelos Bilas;Cheng Liao;Jaswinder Pal Singh

  • Affiliations:
  • Dept. of Elec. and Comp. Eng., 10 King's College Road, University of Toronto, Toronto, ON M5S 3G4, Canada;Dept. of Computer Science, 35 Olden Street, Princeton University, Princeton, NJ;Dept. of Computer Science, 35 Olden Street, Princeton University, Princeton, NJ

  • Venue:
  • ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of page-based software shared virtual memory (SVM) is still far from that achieved on hardware-coherent distributed shared memory (DSM) systems. The interrupt cost for asynchronous protocol processing has been found to be a key source of performance loss and complexity.This paper shows that by providing simple and general support for asynchronous message handling in a commodity network interface (NI), and by altering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. They also require neither visibility into the node memory system nor code instrumentation to identify memory operations. We prototype the mechanisms and such a synchronous home-based LRC protocol, called GeNIMA (GEneral-purpose Network Interface support in a shared Memory Abstraction), on a cluster of SMPs with a programmable NI, though the mechanisms are simple and do not require programmability.We find that the performance improvements are substantial, bringing performance on a small-scale SMP cluster much closer to that of hardware-coherent shared memory for many applications, and we show the value of each of the mechanisms in different applications. Application performance improves by about 37% on average for reasonably well performing applications, even on our relatively slow programmable NI, and more for others. We discuss the key remaining bottlenecks at the protocol level and use a firmware performance monitor in the NI to understand the interactions with and the implications for the communication layer.