Data Replication Strategies in Grid Environments

  • Authors:
  • Affiliations:
  • Venue:
  • ICA3PP '02 Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data Grids provide geographically distributed resourcesfor large-scale data-intensive applications that generatelarge data sets. However, ensuring efficient and fast accessto such huge and widely distributed data is hindered by thehigh latencies of the Internet. To address these problemswe introduce a set of replication management services andprotocols that offer high data availability, low bandwidthconsumption, increased fault tolerance, and improved scalabilityof the overall system. Replication decisions are madebased on a cost model that evaluates data access costs andperformance gains of creating each replica. The estimationof costs and gains is based on such factors as run-time accumulatedread/write statistics, response time, bandwidth,and replica size. To address scalability, replicas are organizedin a combination of hierarchical and flat topologiesthat represent propagation graphs that minimizes interreplicacommunication costs. To evaluate our model we usethe network simulator NS to study the impact of replication.Our results prove that replication improves the performanceof the data access on Data Grid, and that the gain increaseswith the size of data used.