Andrew: a distributed personal computing environment
Communications of the ACM - The MIT Press scientific computation series
Scale and performance in a distributed file system
ACM Transactions on Computer Systems (TOCS)
SSH, The Secure Shell: The Definitive Guide
SSH, The Secure Shell: The Definitive Guide
Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
User-Level Remote Data Access in Overlay Metacomputers
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The Trellis security infrastructure for overlay metacomputers and bridged distributed file systems
Journal of Parallel and Distributed Computing - Special issue: Security in grid and distributed systems
Hi-index | 0.00 |
In metacomputing and grid computing, a computational job may execute on a node that is geographically far away from its data files. In such a situation, some of the issues to be resolved are: First, how can the job access its data? Second, how can the high latency and low bandwidth bottlenecks of typical wide-area networks (WANs) be tolerated? Third, how can the deployment of distributed file systems be made easier? The Trellis Network File System (Trellis NFS) uses a simple, global namespace to provide basic remote data access. Data from any node accessible by Secure Copy can be opened like a file. Aggressive caching strategies for file data and metadata can greatly improve performance across WANs. And, by using a bridging strategy between the well-known Network File System (NFS) and wide-area protocols, the deployment is greatly simplified. As part of the Third Canadian Internetworked Scientific Supercomputer (CISS-3) experiment, Trellis NFS was used as a distributed file system between high-performance computing (HPC) sites across Canada. CISS-3 ramped up over several months, ran in production mode for over 48 hours, and at its peak, had over 4,000 jobs running concurrently. Typically, there were about 180 concurrent jobs using Trellis NFS. We discuss the functionality, scalability, and benchmarked performance of Trellis NFS. Our hands-on experience with CISS and Trellis NFS has reinforced our design philosophy of layering, overlaying, and bridging systems to provide new functionality.