Replica selection strategies in data grid

Authors:
Rashedur M. Rahman;Reda Alhajj;Ken Barker
Affiliations:
Department of Computer Science, University of Calgary, Calgary, Alberta, Canada;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada and Department of Computer Science, Global University, Beirut, Lebanon;Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Venue:
Journal of Parallel and Distributed Computing
Year:
2008

Citing 15
Cited 7

Instance-Based Learning Algorithms

Machine Learning
Analytical performance prediction on multicomputers

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Fundamentals of neural networks: architectures, algorithms, and applications

Fundamentals of neural networks: architectures, algorithms, and applications
The MONARC toolset for simulating large network-distributed processing systems

Proceedings of the 32nd conference on Winter simulation
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
Predicting the Performance of Wide Area Data Transfers

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Replica Selection in the Globus Data Grid

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
The Globus Project: A Status Report

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Using Regression Techniques to Predict Large Data Transfers

International Journal of High Performance Computing Applications
The UltraLight Project: The Network as an Integrated and Managed Resource for Data-Intensive Science

Computing in Science and Engineering
Replica Placement Design with Static Optimality and Dynamic Maintainability

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Study of Different Replica Placement and Maintenance Strategies in Data Grid

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid

Parallel and multi-wavelength downloading in optical grid networks

Photonic Network Communications
Enhanced data replication broker

MIWAI'11 Proceedings of the 5th international conference on Multi-Disciplinary Trends in Artificial Intelligence
A dynamic replica management strategy in data grid

Journal of Network and Computer Applications
A two phased service oriented Broker for replica selection in data grids

Future Generation Computer Systems
Enhanced Dynamic Hierarchical Replication and Weighted Scheduling Strategy in Data Grid

Journal of Parallel and Distributed Computing
Combination of data replication and scheduling algorithm for improving data availability in Data Grids

Journal of Network and Computer Applications
Job scheduling and dynamic data replication in data grid environment

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Replication in Data Grids reduces access latency and bandwidth consumption. When different sites hold replicas of datasets, there is a significant benefit realized by selecting the best replica. By selecting the best replica, the access latency can be minimized. In this research, we propose two different replica selection techniques. To select the best replica from information gathered locally, a simple technique called the k-Nearest Neighbor (KNN) rule is exploited. The KNN rule selects the best replica for a file by considering previous file transfer logs indicating the history of the file and those nearby. We also propose a predictive technique to estimate the transfer time between sites. The predicted transfer time can be used as an estimate of transfer bandwidth of different sites that hold replica currently, and help in selecting the best replica among different sites. Simulation results demonstrate that the k-nearest algorithm shows a significant performance improvement over the traditional replica catalog based model. Besides, the neural network predictive technique estimates the transfer time among sites more accurately than the multi-regression model.