Symmetric active/active metadata service for high availability parallel file systems

Authors:
Xubin He;Li Ou;Christian Engelmann;Xin Chen;Stephen L. Scott
Affiliations:
Department of Electrical and Computer Engineering, Tennessee Technological University, United States;Scalable Systems Group, Dell Inc., United States;Computer Science and Mathematics Division, Oak Ridge National Laboratory, United States;Department of Electrical and Computer Engineering, Tennessee Technological University, United States;Computer Science and Mathematics Division, Oak Ridge National Laboratory, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 33
Cited 1

Using Time Instead of Timeout for Fault-Tolerant Distributed Systems.

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
The Totem single-ring ordering and membership protocol

ACM Transactions on Computer Systems (TOCS)
Serverless network file systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The Transis approach to high availability cluster communication

Communications of the ACM
Distributing trust with the Rampart toolkit

Communications of the ACM
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Frangipani: a scalable distributed file system

Proceedings of the sixteenth ACM symposium on Operating systems principles
Efficient atomic broadcast using deterministic merge

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
Practical byzantine fault tolerance and proactive recovery

ACM Transactions on Computer Systems (TOCS)
Early-Delivery Dynamic Atomic Broadcast

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Efficient Metadata Management in Large Distributed Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
An Indulgent Uniform Total Order Algorithm with Optimistic Delivery

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
An evaluation of the Amoeba group communication system

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Using Optimistic Atomic Broadcast in Transaction Processing Systems

IEEE Transactions on Knowledge and Data Engineering
Total order broadcast and multicast algorithms: Taxonomy and survey

ACM Computing Surveys (CSUR)
Dynamic Metadata Management for Petabyte-Scale File Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Comparative Evaluation of Transparent Scaling Techniques for Dynamic Content Servers

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Metadata Efficiency in Versioning File Systems

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Meta-data snapshotting: a simple mechanism for file system consistency

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
A comparison of file system workloads

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Journaling versus soft updates: asynchronous meta-data protection in file systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Tolerating byzantine faults in transaction processing systems using commit barrier scheduling

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Distributed versioning: consistent replication for scaling back-end databases of dynamic content web sites

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Symmetric active/active metadata service for highly available cluster storage systems

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Total order communications: a practical analysis

EDCC'05 Proceedings of the 5th European conference on Dependable Computing

Transparent redundant computing with MPI

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending on the replication technique used. In addition, the replication overhead for multiple metadata services can be very high. The research in this paper targets the symmetric active/active replication model, which uses multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability can be achieved with an acceptable performance trade-off using the symmetric active/active metadata service solution.