A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services

  • Authors:
  • Jing Zhang;Gongqing Wu;Xuegang Hu;Xindong Wu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The improvement of file access performance is a great challenge in real-time cloud services. In this paper, we analyze preconditions of dealing with this problem considering the aspects of requirements, hardware, software, and network environments in the cloud. Then we describe the design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache). The cache system consists of a client library and multiple cache services. The cache services are designed with three access layers an in-memory cache, a snapshot of the local disk, and the actual disk view as provided by HDFS. The files loading from HDFS are cached in the shared memory which can be directly accessed by a client library. Multiple applications integrated with a client library can access a cache service simultaneously. Cache services are organized in the P2P style using a distributed hash table. Every file cached has three replicas in different cache service nodes in order to improve robustness and alleviates the workload. Experimental results show that the novel cache system can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.