On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems

Authors:
Qiao Lian;Wei Chen;Zheng Zhang
Affiliations:
Microsoft Research Asia;Microsoft Research Asia;Microsoft Research Asia
Venue:
ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Year:
2005

Citing 0
Cited 10

BitVault: a highly reliable distributed data retention platform

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
RADOS: a scalable, reliable storage service for petabyte-scale storage clusters

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Exploring data reliability tradeoffs in replicated storage systems

Proceedings of the 18th ACM international symposium on High performance distributed computing
Churn-Resilient Replication Strategy for Peer-to-Peer Distributed Hash-Tables

SSS '09 Proceedings of the 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems
Topology-aware replica placement in fault-tolerant embedded networks

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Differentiated replication strategy in data centers

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Data life time for different placement policies in P2P storage systems

Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Availability in globally distributed storage systems

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
RelaxDHT: A churn-resilient replication strategy for peer-to-peer distributed hash-tables

ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Hierarchical RAID: Design, performance, reliability, and recovery

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data reliability of distributed brick storage systems critically depends on the replica placement policy, and the two governing forces are repair speed and sensitivity to multiple concurrent failures. In this paper, we provide an analytical framework to reason and quantify the impact of replica placement policy to system reliability. The novelty of the framework is its consideration of the bounded network bandwidth for data maintenance. We apply the framework to two popular schemes, namely sequential placement and random placement, and show that both have drawbacks that significantly degrade data reliability. We then propose the stripe placement scheme and find the near-optimal configuration parameter such that it provides much better reliability. We further discuss the possibility of addressing the problem of correlated brick failures in our analytical framework.