Replica selection in grid environment: a data-mining approach

  • Authors:
  • Rashedur M. Rahman;Ken Barker;Reda Alhajj

  • Affiliations:
  • University of Calgary, Canada;University of Calgary, Canada;University of Calgary, Canada

  • Venue:
  • Proceedings of the 2005 ACM symposium on Applied computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grid technology is developed to share data across many organizations in different geographical locations. The idea of replication is to store data into different locations to improve data access performance. When different sites hold replicas, there are significant benefits realized when selecting the best replica. Current research shows that both network bandwidth and disk I/O plays major role in file transfer. In this paper, we describe a new optimization technique that considers both disk throughput and network latencies when selecting the best replica. Previous history of data transfer can help in predicting the best site that can hold replica. The k-nearest neighbor rule is one such predictive technique. In this technique, when a new request arrives for the best replica, it looks at all previous data to find a subset of previous file requests that are similar to it and uses them to predict the best site that can hold the replica. In this work, we implement and test k-nearest algorithm for various file access patterns and compare results with the traditional replica catalog based model. The results demonstrate that our model outperforms the traditional model for sequential and unitary random file access requests.