An Efficient, Protected Message Interface

  • Authors:
  • Whay Sing Lee;William J. Dally;Stephen W. Keckler;Nicholas P. Carter;Andrew Chang

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • Computer
  • Year:
  • 1998

Quantified Score

Hi-index 4.11

Visualization

Abstract

With increasing demand for computing power, multiprocessing computers will become more common in the future. In these systems, the growing discrepancy between processor and memory technologies will cause tightly integrated message interfaces to be essential for achieving the necessary efficiency, which is especially important in light of the growing interest in software-distributed, shared-memory systems. In traditional message interfaces, high latency and processor occupancy inhibit the exploitation of large-scale parallelism. Newer designs address this problem by removing OS layers from the interface, but the remaining overhead is still large. To amortize this overhead, programmers use messages that are hundreds to thousands of words in size. Consequently, threads run for thousands of cycles between communications, which precludes much parallelization. When designers incorporate multiple hardware thread slots onto each node, this overhead is exacerbated if primitive support for fair and protected resource allocation is lacking. Much of the communication overhead can be removed by carefully making complementary design choices in primitive messaging mechanisms in order to facilitate messages as short as several words in size and to enable fine-grain parallelism. The authors conduct a performance evaluation of several primitive messaging mechanisms-dispatch mechanisms (how the processor reacts to message arrivals), memory-mapped versus register-mapped interfaces, and streaming versus buffered interfaces-baselining these results against the MIT M-Machine and its tightly integrated message interfaces. They find that a message can be dispatched up to 18 times faster by reserving a hardware thread context for message reception instead of an interrupt- driven interface. They also find that the mapping decision is important, with integrated register-mapped interfaces as much as 3.5 times more efficient than conventional systems. To meet the challenges and exploit the opportunities presented by emerging multithreaded processor architectures, low overhead mechanisms for protection against message corruption, interception, and starvation must be integral to the message system design. The authors hope that the simple messaging mechanisms described in this article can help provide a solution to these challenges.