SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Scalable Shared-Memory Multiprocessing
Scalable Shared-Memory Multiprocessing
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Extending NT virtual memory by SCI-based hardware DSM
WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
NUMA-aware memory manager with dominant-thread-based copying GC
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
Hi-index | 0.00 |
We have built a 16-way, ccNUMA multiprocessor prototype to study the feasibility of building large scale servers out of Standard High Volume (SHV) components. Using a cache-coherent interconnect, our prototype combines four 4-processor SMPs built using 350MHz Intel XeonTM processors, yielding a 16-way system with a total of 4 GBytes of physical memory distributed over the nodes. Such an environment poses several performance challenges to Windows NT®, which assumes that memory is equidistant to all processors. To overcome these problems, we have implemented an abstraction called a Resource Set, which allows threads to specify their execution and memory affinity across the ccNUMA complex. We used a suite of parallel applications to evaluate the scalability and performance of the system. Our results confirm the feasibility of building ccNUMA systems out of SHV components, and suggest that memory allocation affinity should be incorporated as part of the standard Windows NT API. Also, the performance degradation due to poor bus bandwidth in the current generation of Intel-based processors often dominates the degradation due to the latency of remote memory accesses.