Disconnected operation in the Coda file system
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Disk-directed I/O for MIMD multiprocessors
ACM Transactions on Computer Systems (TOCS)
GASS: a data movement and access service for wide area computing systems
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Reliable File Transfer in Grid Environments
LCN '02 Proceedings of the 27th Annual IEEE Conference on Local Computer Networks
Data Sieving and Collective I/O in ROMIO
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Nimrod: a tool for performing parametrised simulations using distributed workstations
HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Resource Co-Allocation in Computational Grids
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Grid-Based File Access: The Legion I/O Model
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
MSS '01 Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies
High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Integrating fault-tolerance techniques in grid applications
Integrating fault-tolerance techniques in grid applications
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Future Generation Computer Systems
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Parallel File Transfer Protocol for Clusters and Grid Systems
E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
AgentTeamwork: Coordinating grid-computing jobs with mobile agents
Applied Intelligence
Extended mpijava for distributed checkpointing and recovery
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Hi-index | 0.00 |
PC grid is a cost-effective grid-computing platform that attracts users by allocating to their massively parallel applications as many desktop computers as requested. However, a challenge is how to distribute necessary files to remote computing nodes that may be unconnected to the same network file system, equipped with insufficient disk space to keep entire files, and even powered off asynchronously.Targeting PC grid, the AgentTeamwork grid-computing middleware deploys a hierarchy of mobile agents to remote desktops so as to launch, monitor, check-point, and resume a parallel and distributed computing job. To achieve high-speed file distribution, AgentTeamwork takes advantage of its agent hierarchy. The system partitions files into stripes at the tree root if they are random-access files, duplicates them at each tree level if they are shared among all remote nodes, fragments them into smaller messages if they are too large to relay to a lower tree level, aggregates such messages in a larger fragment if they are in transit to the same subtree, and returns output files to the user along multi-paths established within the tree. To achieve fault-tolerant file delivery, each agent periodically takes a snapshot of in-transit and on-memory file messages with its user job, and thus resumes them from the latest snapshot when they crash accidentally.This paper presents an implementation and its competitive performance of AgentTeamwork's file-distribution algorithm including file partitioning, transfer, check-pointing, and consistency maintenance.