Scale and performance in a distributed file system
ACM Transactions on Computer Systems (TOCS)
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Grid Datafarm Architecture for Petascale Data Intensive Computing
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Self-certifying file system
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Study of Different Replica Placement and Maintenance Strategies in Data Grid
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Automatic Clustering of Grid Nodes
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
A fast topology inference: a building block for network-aware parallel processing
Proceedings of the 16th international symposium on High performance distributed computing
Data driven workflow planning in cluster management systems
Proceedings of the 16th international symposium on High performance distributed computing
File Clustering Based Replication Algorithm in a Grid Environment
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
FIRE: A File Reunion Based Data Replication Strategy for Data Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Efficient Management of Fragmented Replica in Data Grids
International Journal of Grid and High Performance Computing
A classification of file placement and replication methods on grids
Future Generation Computer Systems
Hi-index | 0.00 |
Replication in grid file systems can significantly improve I/O performance of data-intensive grid applications, but its manual creation and placement would be impractical in a real grid environment involving thousands to millions of files accessed per application. Although automatic determination of where and how many replicas should be created should be decided with regards to application access patterns and network throughputs, thereby achieving high application throughput and minimizing replica space overhead, previous studies have focused on limited parameter spaces in their algorithmic optimizations. We propose an automated replication algorithm that allows most of I/O accesses to be performed within a given time threshold, while simultaneously minimizing the space overhead by replication. Our algorithm models the replication problem as a combinatorial optimization problem, where the constraints are derived from the given access time threshold and various system parameters, while the objective function being to minimize file replication costs. We solve the optimization problem by dynamically monitoring and estimating inter-node link throughputs and file access patterns of running applications. Our simulation-based studies suggest that the proposed algorithm can achieve higher performance than simple techniques, such as ones that always or never create replicas, while keeping storage usage very low. The results also indicate that the proposed automated algorithm can perform comparably with manual replica placement.