Using classification techniques to improve replica selection in data grid

Authors:
Hai Jin;Jin Huang;Xia Xie;Qin Zhang
Affiliations:
Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China;Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China;Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China;Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, China
Venue:
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II
Year:
2006

Citing 12
Cited 0

Instance-Based Learning Algorithms

Machine Learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Data mining: concepts and techniques

Data mining: concepts and techniques
Predicting the Performance of Wide Area Data Transfers

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Giggle: a framework for constructing scalable replica location services

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing

MSS '01 Proceedings of the Eighteenth IEEE Symposium on Mass Storage Systems and Technologies
File and Object Replication in Data Grids

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Eliminating Replica Selection - Using Multiple Replicas to Accelerate Data Transfer on Grids

ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Replica Placement in Data Grid: Considering Utility and Risk

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Replica selection in grid environment: a data-mining approach

Proceedings of the 2005 ACM symposium on Applied computing
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data grid is developed to facilitate sharing data and resources located in different parts of the world The major barrier to support fast data access in a data grid is the high latency of wide area networks and the Internet Data replication is adopted to improve data access performance When different sites hold replicas, there are significant benefits while selecting the best replica In this paper, we propose a new replica selection strategy based on classification techniques In this strategy the replica selection problem is regarded as a classification problem The data transfer history is utilized to help predicting the best site holding the replica The adoption of the switch mechanism of replica selection model avoids a waste of time for inaccurate classification results In this paper, we study and simulate KNN and SVM methods for different file access patterns and compare results with the traditional replica catalog model The results show that our replica selection model outperforms the traditional one for certain file access requests.