The Berkeley DB Book
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
A Formal Model of Crash Recovery in a Distributed System
IEEE Transactions on Software Engineering
Hadoop: The Definitive Guide
Comparing Hadoop and Fat-Btree based access method for small file I/O applications
WAIM'10 Proceedings of the 11th international conference on Web-age information management
A Load-Driven Task Scheduler with Adaptive DSC for MapReduce
GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
High performance RDMA-based design of HDFS over InfiniBand
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Hadoop is widely adopted to support data intensive distributed applications. Many of them are mission critical and require inherent high availability of Hadoop. Unfortunately, Hadoop has no high availability support yet, and it is not trivial to enhance Hadoop. Based on thorough investigation of Hadoop, this paper proposes a metadata replication based solution to enable Hadoop high availability by removing single point of failure in Hadoop. The solution involves three major phases: in initialization phase, each standby/slave node is registered to active/primary node and its initial metadata (such as version file and file system image) are caught up with those of active/primary node; in replication phase, the runtime metadata (such as outstanding operations and lease states) for failover in future are replicated; in failover phase, standby/new elected primary node takes over all communications. The solution presents several unique features for Hadoop, such as runtime configurable synchronization mode. The experiments demonstrate the feasibility and efficiency of our solution.