Bridging local and wide area networks for overlay distributed file systems

Authors:
Michael Closson;Paul Lu
Affiliations:
Dept. of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Dept. of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Venue:
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Year:
2005

Citing 9
Cited 1

Andrew: a distributed personal computing environment

Communications of the ACM - The MIT Press scientific computation series
Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
SSH, The Secure Shell: The Definitive Guide

SSH, The Secure Shell: The Definitive Guide
A National-Scale Authentication Infrastructure

Computer
Grid Services for Distributed System Integration

Computer
Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
User-Level Remote Data Access in Overlay Metacomputers

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
The PUNCH Virtual File System: Seamless Access to Decentralized Storage Services in a Computational Grid

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles

The Trellis security infrastructure for overlay metacomputers and bridged distributed file systems

Journal of Parallel and Distributed Computing - Special issue: Security in grid and distributed systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In metacomputing and grid computing, a computational job may execute on a node that is geographically far away from its data files. In such a situation, some of the issues to be resolved are: First, how can the job access its data? Second, how can the high latency and low bandwidth bottlenecks of typical wide-area networks (WANs) be tolerated? Third, how can the deployment of distributed file systems be made easier? The Trellis Network File System (Trellis NFS) uses a simple, global namespace to provide basic remote data access. Data from any node accessible by Secure Copy can be opened like a file. Aggressive caching strategies for file data and metadata can greatly improve performance across WANs. And, by using a bridging strategy between the well-known Network File System (NFS) and wide-area protocols, the deployment is greatly simplified. As part of the Third Canadian Internetworked Scientific Supercomputer (CISS-3) experiment, Trellis NFS was used as a distributed file system between high-performance computing (HPC) sites across Canada. CISS-3 ramped up over several months, ran in production mode for over 48 hours, and at its peak, had over 4,000 jobs running concurrently. Typically, there were about 180 concurrent jobs using Trellis NFS. We discuss the functionality, scalability, and benchmarked performance of Trellis NFS. Our hands-on experience with CISS and Trellis NFS has reinforced our design philosophy of layering, overlaying, and bridging systems to provide new functionality.