Hadoop high availability through metadata replication

Authors:
Feng Wang;Jie Qiu;Jie Yang;Bo Dong;Xinhui Li;Ying Li
Affiliations:
IBM China Research Laboratory, Beijing, China;IBM China Research Laboratory, Beijing, China;IBM China Research Laboratory, Beijing, China;Xi'an Jiaotong University, Xi'an, China;IBM China Research Laboratory, Beijing, China;IBM China Research Laboratory, Beijing, China
Venue:
Proceedings of the first international workshop on Cloud data management
Year:
2009

Citing 4
Cited 3

The Berkeley DB Book

The Berkeley DB Book
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
A Formal Model of Crash Recovery in a Distributed System

IEEE Transactions on Software Engineering
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide

Comparing Hadoop and Fat-Btree based access method for small file I/O applications

WAIM'10 Proceedings of the 11th international conference on Web-age information management
A Load-Driven Task Scheduler with Adaptive DSC for MapReduce

GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
High performance RDMA-based design of HDFS over InfiniBand

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hadoop is widely adopted to support data intensive distributed applications. Many of them are mission critical and require inherent high availability of Hadoop. Unfortunately, Hadoop has no high availability support yet, and it is not trivial to enhance Hadoop. Based on thorough investigation of Hadoop, this paper proposes a metadata replication based solution to enable Hadoop high availability by removing single point of failure in Hadoop. The solution involves three major phases: in initialization phase, each standby/slave node is registered to active/primary node and its initial metadata (such as version file and file system image) are caught up with those of active/primary node; in replication phase, the runtime metadata (such as outstanding operations and lease states) for failover in future are replicated; in failover phase, standby/new elected primary node takes over all communications. The solution presents several unique features for Hadoop, such as runtime configurable synchronization mode. The experiments demonstrate the feasibility and efficiency of our solution.