ACM Transactions on Computer Systems (TOCS)
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Processor-pool-based scheduling for large-scale NUMA multiprocessors
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Using processor affinity in loop scheduling on shared-memory multiprocessors
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Scheduling for locality in shared-memory multiprocessors
Scheduling for locality in shared-memory multiprocessors
Scalable memory management through hierarchical symmetric multiprocessing
Scalable memory management through hierarchical symmetric multiprocessing
Issues in shared memory multiprocessor scheduling: a performance evaluation
Issues in shared memory multiprocessor scheduling: a performance evaluation
Impact of NUMA effects on high-speed networking with multi-opteron machines
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
The thesis of this paper is that scheduling decisions in large-scale, shared-memory, NUMA (Non-Uniform Memory Access) multiprocessors must consider not only how many processors, but also which processors to allocate to each application. We call the problem of assigning parallel processes of an application to processors application placement. We explore the importance of placement decisions by measuring the execution time of several parallel applications using different placements on a shared-memory NUMA multiprocessor. The results of these experiments lead us to conclude that, as expected, in small-scale mildly NUMA multiprocessors, placement decisions have only a minor affect on the execution time of parallel applications. However, the results also show that placement decisions in large-scale multiprocessors are critical and localization that considers the architectural clusters inherent in these systems is essential. Our experiments also show that the importance of placement decisions increases substantially with the size and NUMAness of the system and that the placement of individual processes of an application within the subset of chosen processors also significantly impacts performance.