A novel dynamic network data replication scheme based on historical access record and proactive deletion

Authors:
Zhe Wang;Tao Li;Naixue Xiong;Yi Pan
Affiliations:
Dept. of Computer Science, Sichuan University, Chengdu, China;Dept. of Computer Science, Sichuan University, Chengdu, China;Dept. of Computer Science, Georgia State University, Atlanta, USA;Dept. of Computer Science, Georgia State University, Atlanta, USA
Venue:
The Journal of Supercomputing
Year:
2012

Citing 19
Cited 0

Identifying Dynamic Replication Strategies for a High-Performance Data Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Analysis and modeling of world wide web traffic

Analysis and modeling of world wide web traffic
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Static and adaptive distributed data replication using genetic algorithms

Journal of Parallel and Distributed Computing
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
An on-line replication strategy to increase availability in Data Grids

Future Generation Computer Systems
Comparison and analysis of ten static heuristics-based Internet data replication techniques

Journal of Parallel and Distributed Computing
A dynamic data replication strategy using access-weights in data grids

The Journal of Supercomputing
A model to predict the optimal performance of the Hierarchical Data Grid

Future Generation Computer Systems
IRM: Integrated File Replication and Consistency Maintenance in P2P Systems

IEEE Transactions on Parallel and Distributed Systems
Dynamic replication algorithms for the multi-tier Data Grid

Future Generation Computer Systems - Special issue: Parallel computing technologies
The impact of data replication on job scheduling performance in the Data Grid

Future Generation Computer Systems
Replica Placement in Multi-tier Data Grid

DASC '09 Proceedings of the 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing
Adaptive popularity-driven replica placement in hierarchical data grids

The Journal of Supercomputing
The complexity of static data replication in data grids

Parallel Computing
An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

IEEE Transactions on Parallel and Distributed Systems
PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid

Future Generation Computer Systems
CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
A threshold-based dynamic data replication strategy

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.