The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Scheduling in multiprogrammed parallel systems
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems
IEEE Transactions on Software Engineering
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Multi-level shared caching techniques for scalability in VMP-M/C
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Characterizations of parallelism in applications and their use in scheduling
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Processor scheduling in shared memory multiprocessors
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Hector: A Hierarchically Structured Shared-Memory Multiprocessor
Computer - Special issue on experimental research in computer architecture
PLUS: a distributed shared-memory system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Experimental Comparison of Memory Management Policies for NUMA Multiprocessors
Experimental Comparison of Memory Management Policies for NUMA Multiprocessors
Processor scheduling on multiprogrammed, distributed memory parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Processor allocation policies for message-passing parallel computers
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Scheduling memory constrained jobs on distributed memory parallel computers
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Analysis of Processor Allocation in Multiprogrammed, Distributed-Memory Parallel Processing Systems
IEEE Transactions on Parallel and Distributed Systems
Local versus Global Schedulers with Processor Co-allocation in Multicluster Systems
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Parallel Job Scheduling: A Performance Perspective
Performance Evaluation: Origins and Directions
On the importance of parallel application placement in NUMA multiprocessors
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Rethink the virtual machine template
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Optimizing the access to read-only data in grid computing
DAIS'05 Proceedings of the 5th IFIP WG 6.1 international conference on Distributed Applications and Interoperable Systems
Hi-index | 0.00 |
Large-scale Non-Uniform Memory Access (NUMA) multiprocessors are gaining increased attention due to their potential for achieving high performance through the replication of relatively simple components. Because of the complexity of such systems, scheduling algorithms for parallel applications are crucial in realizing the performance potential of these systems. In particular, scheduling methods must consider the scale of the system, with the increased likelihood of creating bottlenecks, along with the NUMA characteristics of the system, and the benefits to be gained by placing threads close to their code and data.We propose a class of scheduling algorithms based on processor pools. A processor pool is a software construct for organizing and managing a large number of processors by dividing them into groups called pools. The parallel threads of a job are run in a single processor pool, unless there are performance advantages for a job to span multiple pools. Several jobs may share one pool. Our simulation experiments show that processor pool-based scheduling may effectively reduce the average job response time. The performance improvements attained by using processor pools increase with the average parallelism of the jobs, the load level of the system, the differentials in memory access costs, and the likelihood of having system bottlenecks. As the system size increasesr, while maintaining the workload composition and intensity, we observed that processor pools can be used to provide significant performance improvements. We therefore conclude that processor pool-based scheduling may be an effective and efficient technique for scalable systems.