Impact of Memory Contention on Dynamic Scheduling on NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Lightweight reference affinity analysis
Proceedings of the 19th annual international conference on Supercomputing
Hardware profile-guided automatic page placement for ccNUMA systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Operating system scheduling for chip multithreaded processors
Operating system scheduling for chip multithreaded processors
An efficient multi-level trace toolkit for multi-threaded applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Work-stealing for mixed-mode parallelism by deterministic team-building
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Application-specific thread schedulers for internet server applications
Concurrency and Computation: Practice & Experience
Design and Implementation of Portable and Efficient Non-blocking Collective Communication
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.00 |
Exploiting full computational power of current more and more hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. Unfortunately, most operating systems only provide a poor scheduling API that does not allow applications to transmit valuable scheduling hints to the system. In a previous paper [1], we showed that using a bubble-based thread scheduler can significantly improve applications' performance in a portable way. However, since multithreaded applications have various scheduling requirements, there is no universal scheduler that could meet all these needs. In this paper, we present a framework that allows scheduling experts to implement and experiment with customized thread schedulers. It provides a powerful API for dynamically distributing bubbles among the machine in a high-level, portable, and efficient way. Several examples show how experts can then develop, debug and tune their own portable bubble schedulers.