Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Designing broadcasting algorithms in the postal model for message-passing systems
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Toward a method of object-oriented concurrent programming
Communications of the ACM
GLUnix: a global layer Unix for a network of workstations
Software—Practice & Experience - Special issue on multiprocessor operating systems
BProc: the Beowulf distributed process space
ICS '02 Proceedings of the 16th international conference on Supercomputing
Scalable parallel application launch on Cplant™
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Strategies for Dynamic Load Balancing on Highly Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
A taxonomy of scheduling in general-purpose distributed computing systems
IEEE Transactions on Software Engineering
Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
REXEC: A Decentralized, Secure Remote Execution Environment for Clusters
CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
STORM: lightning-fast resource management
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A batch scheduler with high level components
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors
Proceedings of the 2007 international workshop on Parallel symbolic computation
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Fast and scalable startup of MPI programs in infiniband clusters
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Incremental multi-classifier learning algorithm on grid'5000 for large scale image annotation
Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
A multi-level scalable startup for parallel applications
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Going back and forth: efficient multideployment and multisnapshotting on clouds
Proceedings of the 20th international symposium on High performance distributed computing
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Enabling large-scale testing of IaaS cloud platforms on the grid'5000 testbed
Proceedings of the 2013 International Workshop on Testing the Cloud
Hi-index | 0.00 |
This article deals with TakTuk, a middleware that deploys efficiently parallel remote executions on large scale grids (thousands of nodes). This tool is mostly intended for interactive use: distributed machines administration and parallel applications development. Thus, it has to minimize the time required to complete the whole deployment process. To achieve this minimization, we propose and validate a remote execution deployment model inspired by the real world behavior of standard remote execution protocols (rsh and ssh). From this model and from existing works in networking, we deduce an optimal deployment algorithm for the homogeneous case. Unfortunately, this optimal algorithm does not translate directly to the heterogeneous case. Therefore, we derive from the theoretical solution a heuristic based on dynamic work-stealing that adapts to heterogeneities (processors, links, load, ...). The underlying principle of this heuristic is the same as the principle of the optimal algorithm: to deploy nodes as soon as possible. Experiments assess TakTuk efficiency and show that TakTuk scales well to thousands of nodes. Compared to similar tools, TakTuk ranks among the best performers while offering more features and versatility. In particular, TakTuk is the only tool really suited to remote executions deployment on grids or more heterogeneous platforms.