PFRF: An adaptive data replication algorithm based on star-topology data grids

Authors:
Ming-Chang Lee;Fang-Yie Leu;Ying-ping Chen
Affiliations:
Department of Computer Science, National Chiao Tung University, Taiwan;Department of Computer Science, Tunghai University, Taiwan;Department of Computer Science, National Chiao Tung University, Taiwan
Venue:
Future Generation Computer Systems
Year:
2012

Citing 32
Cited 3

An adaptive data replication algorithm

ACM Transactions on Database Systems (TODS)
A data intensive distributed computing architecture for “grid” applications

Future Generation Computer Systems - Special issue on high performance computing and networking Europe 1999
The MicroGrid: a scientific tool for modeling computational gridsr

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Simulation of Dynamic Grid Replication Strategies in OptorSim

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Identifying Dynamic Replication Strategies for a High-Performance Data Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Scheduling Distributed Applications: the SimGrid Simulation Framework

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Evaluating Scheduling and Replica Optimisation Strategies in OptorSim

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Planning spatial workflows to optimize grid performance

Proceedings of the 2006 ACM symposium on Applied computing
Operating System Concepts

Operating System Concepts
Job scheduling and data replication on data grids

Future Generation Computer Systems
An on-line replication strategy to increase availability in Data Grids

Future Generation Computer Systems
A toolkit for modelling and simulating data Grids: an extension to GridSim

Concurrency and Computation: Practice & Experience
A dynamic data replication strategy using access-weights in data grids

The Journal of Supercomputing
KOALA: a co-allocating grid scheduler

Concurrency and Computation: Practice & Experience
New worker-centric scheduling strategies for data-intensive grid applications

Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Branch replication scheme: A new model for data replication in large scale data grids

Future Generation Computer Systems
A resource discovery tree using bitmap for grids

Future Generation Computer Systems
Cumulus: Filesystem backup to the cloud

ACM Transactions on Storage (TOS)
File-based replica management

Future Generation Computer Systems
Dynamic replication algorithms for the multi-tier Data Grid

Future Generation Computer Systems - Special issue: Parallel computing technologies
Improving reliability of a heterogeneous grid-based intrusion detection platform using levels of redundancies

Future Generation Computer Systems
Realistic Workload Modeling and Its Performance Impacts in Large-Scale eScience Grids

IEEE Transactions on Parallel and Distributed Systems
On the Benefit of Processor Coallocation in Multicluster Grid Systems

IEEE Transactions on Parallel and Distributed Systems
An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

IEEE Transactions on Parallel and Distributed Systems
Robust Load Delegation in Service Grid Environments

IEEE Transactions on Parallel and Distributed Systems
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Dynamic replication in a data grid using a Modified BHR Region Based Algorithm

Future Generation Computer Systems
PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid

Future Generation Computer Systems
Ranking Spatial Data by Quality Preferences

IEEE Transactions on Knowledge and Data Engineering
A novel multi-agent reinforcement learning approach for job scheduling in Grid computing

Future Generation Computer Systems
A new distributed and hierarchical mechanism for service discovery in a grid environment

Future Generation Computer Systems

A highly reliable and parallelizable data distribution scheme for data grids

Future Generation Computer Systems
Enhanced Dynamic Hierarchical Replication and Weighted Scheduling Strategy in Data Grid

Journal of Parallel and Distributed Computing
Decreasing power consumption with energy efficient data aware strategies

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, data replication algorithms have been widely employed in data grids to replicate frequently accessed data to appropriate sites. The purposes are shortening file transmission distance and delivering files from nearby sites to local sites so as to improve data access performance and reduce bandwidth consumption. Some of the algorithms were designed based on unlimited storage. However, they might not be practical in real-world data grids since currently no system has infinite storage. Others were implemented on limited storage environments, but none of them considers data access patterns which reflect the changes of users' interests, and these are important parameters affecting file retrieval efficiency and bandwidth consumption. In this paper, we propose an adaptive data replication algorithm, called the Popular File Replicate First algorithm (PFRF for short), which is developed on a star-topology data grid with limited storage space based on aggregated information on previous file accesses. The PFRF periodically calculates file access popularity to track the variation of users' access behaviors, and then replicates popular files to appropriate sites to adapt to the variation. We employ several types of file access behaviors, including Zipf-like, geometric, and uniform distributions, to evaluate PFRF. The simulation results show that PFRF can effectively improve average job turnaround time, bandwidth consumption for data delivery, and data availability as compared with those of the tested algorithms.