Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The Natural Work-Stealing Algorithm is Stable
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Deadlock-free scheduling of X10 computations with bounded resources
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Productivity and performance using partitioned global address space languages
Proceedings of the 2007 international workshop on Parallel symbolic computation
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Affinity driven distributed scheduling algorithm for parallel computations
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Hi-index | 0.00 |
With the advent of many-core architectures and strong need for Petascale (and Exascale) performance in scientific domains and industry analytics, efficient scheduling of parallel computations for higher productivity and performance has become very important. Further, movement of massive amounts (Terabytes to Petabytes) of data is very expensive, which necessitates affinity driven computations. Therefore, distributed scheduling of parallel computations on multiple places 1 needs to optimize multiple performance objectives: follow affinity maximally and ensure efficient space, time and message complexity. Simultaneous consideration of these objectives makes distributed scheduling a particularly challenging problem. In addition, parallel computations have data dependent execution patterns which requires online scheduling to effectively optimize the computation orchestration as it unfolds. This paper presents an online algorithm for affinity driven distributed scheduling of multi-place 2 parallel computations. To optimize multiple performance objectives simultaneously, our algorithm uses a low time and message complexity mechanism for ensuring affinity and a randomized work-stealing mechanism within places for load balancing. Theoretical analysis of the expected and probabilistic lower and upper bounds on time and message complexity of this algorithm has been provided. On multi-core clusters such as Blue Gene/P (MPP architecture) and Intel multicore cluster, we demonstrate performance close to the custom MPI+Pthreads code. Further, strong, weak and data (increasing input data size) scalability have been demonstrated on multi-core clusters. Using well known benchmarks, we demonstrate 16% to 30% performance gain as compared to Cilk [6] on multi-core Intel Xeon 5570 (NUMA) architecture. Detailed experimental analysis illustrates efficient space (main memory) utilization as well. To the best of our knowledge, this is the first time multi-objective affinity driven distributed scheduling algorithm has been designed, theoretically analyzed and experimentally evaluated in a multi-place setup for multi-core cluster architectures.