Cluster-based file replication in large-scale distributed systems

  • Authors:
  • Harjinder S. Sandhu;Songnian Zhou

  • Affiliations:
  • Computer Systems Research Institute, University of Toronto, Toronto, ON, M5S 1A4;Computer Systems Research Institute, University of Toronto, Toronto, ON, M5S 1A4

  • Venue:
  • SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing need for data sharing in large-scale distributed systems may place a heavy burden on critical resources such as file servers and networks. Our examination of the workload in one large commercial engineering environment shows that wide-spread sharing of unstable files among tens to hundreds of users is common. Traditional client-based file cacheing techniques are not scalable in such environments.We propose Frolic, a scheme for cluster-based file replication in large-scale distributed file systems. A cluster is a group of workstations and one or more file servers on a local area network. Large distributed systems may have tens or hundreds of clusters connected by a backbone network. By dynamically creating and maintaining replicas of shared files on the file servers in the clusters using those files, we effectively reduce reliance on central servers supporting such files, as well as reduce the distances between the accessing sites and data. We propose and study algorithms for the two main issues in Frolic, 1) locating a valid file replica, and 2) maintaining consistency among replicas. Our simulation experiments using a statistical workload model based upon measurement data and real workload characteristics show that cluster-based file replication can significantly reduce file access delays and server and backbone network utilizations in large-scale distributed systems over a wide range of workload conditions. The workload characteristics most critical to replication performance are: the size of shared files, the number of clusters that modify a file, and the number of consecutive accesses to files from a particular cluster.