Windows NT in a ccNUMA system

  • Authors:
  • B. Brock;G. Carpenter;E. Chiprout;E. Elnozahy;M. Dean;D. Glasco;J. Peterson;R. Rajamony;F. Rawson;R. Rockhold;A. Zimmerman

  • Affiliations:
  • IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX

  • Venue:
  • WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have built a 16-way, ccNUMA multiprocessor prototype to study the feasibility of building large scale servers out of Standard High Volume (SHV) components. Using a cache-coherent interconnect, our prototype combines four 4-processor SMPs built using 350MHz Intel XeonTM processors, yielding a 16-way system with a total of 4 GBytes of physical memory distributed over the nodes. Such an environment poses several performance challenges to Windows NT®, which assumes that memory is equidistant to all processors. To overcome these problems, we have implemented an abstraction called a Resource Set, which allows threads to specify their execution and memory affinity across the ccNUMA complex. We used a suite of parallel applications to evaluate the scalability and performance of the system. Our results confirm the feasibility of building ccNUMA systems out of SHV components, and suggest that memory allocation affinity should be incorporated as part of the standard Windows NT API. Also, the performance degradation due to poor bus bandwidth in the current generation of Intel-based processors often dominates the degradation due to the latency of remote memory accesses.