File Clustering Based Replication Algorithm in a Grid Environment

Authors:
Hitoshi Sato;Satoshi Matsuoka;Toshio Endo
Affiliations:
-;-;-
Venue:
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Year:
2009

Citing 12
Cited 3

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Automated hoarding for mobile computers

Proceedings of the sixteenth ACM symposium on Operating systems principles
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Group-Based Management of Distributed File Caches

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Grid Datafarm Architecture for Petascale Data Intensive Computing

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A taxonomy of Data Grids for distributed data sharing, management, and processing

ACM Computing Surveys (CSUR)
Explicit control a batch-aware distributed file system

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Study of Different Replica Placement and Maintenance Strategies in Data Grid

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Data driven workflow planning in cluster management systems

Proceedings of the 16th international symposium on High performance distributed computing
File grouping for scientific data management: lessons from experimenting with real traces

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Access-pattern and bandwidth aware file replication algorithm in a grid environment

GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing

Ant system for service deployment in private and public clouds

Proceedings of the 2nd workshop on Bio-inspired algorithms for distributed systems
A demand based fault tolerant file replication model for clouds

Proceedings of the CUBE International Information Technology Conference
TBFR: a threshold-based file replication approach for increased file availability and its formal verification

International Journal of Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Replication in grid file systems can significantly improve I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual files, which may introduce inefficient replication overheads for a large number of files. We propose a file clustering based replication algorithm for grid file systems. Our algorithm groups files according to a relationship of simultaneous accesses between files and stores replicas of the clustered files into storage nodes, to satisfy expected most of future read access times to the clustered files and replication times for individual files being minimized under the given storage capacity limitation. Our experiments on a given grid environment, 20 nodes of 5 sites, suggest that the proposed algorithm achieves accurate file clustering and efficient replica management; our clustering policy with the file cluster size limit of 5120 MB and the storage capacity limit for replicas of 10240 MB exhibits 1.58 times efficiency than the policy that never groups related files. The results also indicate that the overheads required for introducing our algorithm significantly affect I/O performance of running applications.