A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services

Authors:
Jing Zhang;Gongqing Wu;Xuegang Hu;Xindong Wu
Affiliations:
-;-;-;-
Venue:
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Year:
2012

Citing 17
Cited 0

Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Using Memcached for Data Distribution in Industrial Environment

ICONS '08 Proceedings of the Third International Conference on Systems
A simple totally ordered broadcast protocol

LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
The case for RAMClouds: scalable high-performance storage entirely in DRAM

ACM SIGOPS Operating Systems Review
Accelerating MapReduce with Distributed Memory Cache

ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
ZooKeeper: wait-free coordination for internet-scale systems

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Using Memcached to Promote Read Throughput in Massive Small-File Storage System

GCC '10 Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing
The case for RAMCloud

Communications of the ACM
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
NoSQL evaluation: A use case oriented survey

CSC '11 Proceedings of the 2011 International Conference on Cloud and Service Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The improvement of file access performance is a great challenge in real-time cloud services. In this paper, we analyze preconditions of dealing with this problem considering the aspects of requirements, hardware, software, and network environments in the cloud. Then we describe the design and implementation of a novel distributed layered cache system built on the top of the Hadoop Distributed File System which is named HDFS-based Distributed Cache System (HDCache). The cache system consists of a client library and multiple cache services. The cache services are designed with three access layers an in-memory cache, a snapshot of the local disk, and the actual disk view as provided by HDFS. The files loading from HDFS are cached in the shared memory which can be directly accessed by a client library. Multiple applications integrated with a client library can access a cache service simultaneously. Cache services are organized in the P2P style using a distributed hash table. Every file cached has three replicas in different cache service nodes in order to improve robustness and alleviates the workload. Experimental results show that the novel cache system can store files with a wide range in their sizes and has the access performance in a millisecond level in highly concurrent environments.