Windows NT in a ccNUMA system

Authors:
B. Brock;G. Carpenter;E. Chiprout;E. Elnozahy;M. Dean;D. Glasco;J. Peterson;R. Rajamony;F. Rawson;R. Rockhold;A. Zimmerman
Affiliations:
IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX
Venue:
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Year:
1999

Citing 13
Cited 2

SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Scalable Shared-Memory Multiprocessing

Scalable Shared-Memory Multiprocessing
The Design and Architecture of the Microsoft Cluster Service - A Practical Approach to High-Availability and Scalability

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Extending NT virtual memory by SCI-based hardware DSM

WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2
Brazos: a third generation DSM system

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997

NUMA-aware memory manager with dominant-thread-based copying GC

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Experience with building a commodity intel-based ccNUMA system

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have built a 16-way, ccNUMA multiprocessor prototype to study the feasibility of building large scale servers out of Standard High Volume (SHV) components. Using a cache-coherent interconnect, our prototype combines four 4-processor SMPs built using 350MHz Intel XeonTM processors, yielding a 16-way system with a total of 4 GBytes of physical memory distributed over the nodes. Such an environment poses several performance challenges to Windows NT®, which assumes that memory is equidistant to all processors. To overcome these problems, we have implemented an abstraction called a Resource Set, which allows threads to specify their execution and memory affinity across the ccNUMA complex. We used a suite of parallel applications to evaluate the scalability and performance of the system. Our results confirm the feasibility of building ccNUMA systems out of SHV components, and suggest that memory allocation affinity should be incorporated as part of the standard Windows NT API. Also, the performance degradation due to poor bus bandwidth in the current generation of Intel-based processors often dominates the degradation due to the latency of remote memory accesses.