Hierarchical clustering: a structure for scalable multiprocessor operating system design
The Journal of Supercomputing - Special issue: trends in parallel operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Evaluation of NUMA Memory Management Through Modeling and Measurements
IEEE Transactions on Parallel and Distributed Systems
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
(De-) Clustering Objects for Multiprocessor System Software
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
K42: building a complete operating system
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Optimizing IPC Performance for Shared-Memory Multiprocessors
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
On the importance of parallel application placement in NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
What can performance counters do for memory subsystem analysis?
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Enabling high-performance memory migration for multithreaded applications on LINUX
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resource-conscious scheduling for energy efficiency on multicore processors
Proceedings of the 5th European conference on Computer systems
Contention-Aware Scheduling on Multicore Systems
ACM Transactions on Computer Systems (TOCS)
Thread Tranquilizer: Dynamically reducing performance variation
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Toward predictable performance in software packet-processing platforms
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Nonuniform memory affinity strategy in multithreaded sparse matrix computations
Proceedings of the 2012 Symposium on High Performance Computing
Dynamic virtual machine scheduling in clouds for architectural shared resources
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
A template library to integrate thread scheduling and locality management for NUMA multiprocessors
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
MemProf: a memory profiler for NUMA multicore systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
The power of batching in the Click modular router
Proceedings of the Asia-Pacific Workshop on Systems
Scalability-based manycore partitioning
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
The power of batching in the click modular router
APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Who watches the watchmen? - protecting operating system reliability mechanisms
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Automatic generation of program affinity policies using machine learning
CC'13 Proceedings of the 22nd international conference on Compiler Construction
Traffic management: a holistic approach to memory placement on NUMA systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
CPI2: CPU performance isolation for shared compute clusters
Proceedings of the 8th ACM European Conference on Computer Systems
Improving execution unit occupancy on SMT-based processors through hardware-aware thread scheduling
Future Generation Computer Systems
Hi-index | 0.00 |
On multicore systems, contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as last-level caches and memory controllers. Previous work investigated how contention could be addressed via scheduling. A contention-aware scheduler separates competing threads onto separate memory hierarchy domains to eliminate resource sharing and, as a consequence, to mitigate contention. However, all previous work on contention-aware scheduling assumed that the underlying system is UMA (uniform memory access latencies, single memory controller). Modern multicore systems, however, are NUMA, which means that they feature non-uniform memory access latencies and multiple memory controllers. We discovered that state-of-the-art contention management algorithms fail to be effective on NUMA systems and may even hurt performance relative to a default OS scheduler. In this paper we investigate the causes for this behavior and design the first contention-aware algorithm for NUMA systems.