Replica selection in grid environment: a data-mining approach

Authors:
Rashedur M. Rahman;Ken Barker;Reda Alhajj
Affiliations:
University of Calgary, Canada;University of Calgary, Canada;University of Calgary, Canada
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 7
Cited 3

Instance-Based Learning Algorithms

Machine Learning
Using Disk Throughput Data in Predictions of End-to-End Grid Data Transfers

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing

MSS '01 Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies
Rules of Thumb in Data Engineering

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
Using Regression Techniques to Predict Large Data Transfers

International Journal of High Performance Computing Applications

Parallel and multi-wavelength downloading in optical grid networks

Photonic Network Communications
Using classification techniques to improve replica selection in data grid

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
A two phased service oriented Broker for replica selection in data grids

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grid technology is developed to share data across many organizations in different geographical locations. The idea of replication is to store data into different locations to improve data access performance. When different sites hold replicas, there are significant benefits realized when selecting the best replica. Current research shows that both network bandwidth and disk I/O plays major role in file transfer. In this paper, we describe a new optimization technique that considers both disk throughput and network latencies when selecting the best replica. Previous history of data transfer can help in predicting the best site that can hold replica. The k-nearest neighbor rule is one such predictive technique. In this technique, when a new request arrives for the best replica, it looks at all previous data to find a subset of previous file requests that are similar to it and uses them to predict the best site that can hold the replica. In this work, we implement and test k-nearest algorithm for various file access patterns and compare results with the traditional replica catalog based model. The results demonstrate that our model outperforms the traditional model for sequential and unitary random file access requests.