Scale and performance in a distributed file system
ACM Transactions on Computer Systems (TOCS)
Automated hoarding for mobile computers
Proceedings of the sixteenth ACM symposium on Operating systems principles
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Group-Based Management of Distributed File Caches
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Grid Datafarm Architecture for Petascale Data Intensive Computing
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A taxonomy of Data Grids for distributed data sharing, management, and processing
ACM Computing Surveys (CSUR)
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Study of Different Replica Placement and Maintenance Strategies in Data Grid
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Data driven workflow planning in cluster management systems
Proceedings of the 16th international symposium on High performance distributed computing
File grouping for scientific data management: lessons from experimenting with real traces
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Access-pattern and bandwidth aware file replication algorithm in a grid environment
GRID '08 Proceedings of the 2008 9th IEEE/ACM International Conference on Grid Computing
Ant system for service deployment in private and public clouds
Proceedings of the 2nd workshop on Bio-inspired algorithms for distributed systems
A demand based fault tolerant file replication model for clouds
Proceedings of the CUBE International Information Technology Conference
International Journal of Information and Communication Technology
Hi-index | 0.00 |
Replication in grid file systems can significantly improve I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual files, which may introduce inefficient replication overheads for a large number of files. We propose a file clustering based replication algorithm for grid file systems. Our algorithm groups files according to a relationship of simultaneous accesses between files and stores replicas of the clustered files into storage nodes, to satisfy expected most of future read access times to the clustered files and replication times for individual files being minimized under the given storage capacity limitation. Our experiments on a given grid environment, 20 nodes of 5 sites, suggest that the proposed algorithm achieves accurate file clustering and efficient replica management; our clustering policy with the file cluster size limit of 5120 MB and the storage capacity limit for replicas of 10240 MB exhibits 1.58 times efficiency than the policy that never groups related files. The results also indicate that the overheads required for introducing our algorithm significantly affect I/O performance of running applications.