Using Memcached to Promote Read Throughput in Massive Small-File Storage System

Authors:
Chuncong Xu;Xiaomeng Huang;Nuo Wu;Pengzhi Xu;Guangwen Yang
Affiliations:
-;-;-;-;-
Venue:
GCC '10 Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing
Year:
2010

Citing 0
Cited 1

A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of the bottleneck of disk I/O, the distributed file system based on disk is limited in the performance on data throughput and latency. It is a big challenge for such a system to meet the high performance requirement of the massive small-file storage. Cache has been widely used in storage system to improve the data access performance. In order to support the storage of massive small files, we have integrated memcached into our distributed file system to optimize the storage of massive small files. However eviction problem arose from LRU replacement algorithm in memcached. It means that the non-stale objects might be replaced due to large short-lived objects. Therefore, we proposed Prioritized Cache (PC) and Prioritized Cache Management (PCM) to solve the problem. The cache of memcached is reorganized and classified into permanent cache and temporary cache. Furthermore, in order to alleviate side effects on hit rate in sequential access, temporary cache is partitioned into different parts with different priorities and managed according to the priorities. We have implemented and evaluated the integrated prototype system. The experimental results show that the improved distributed file system with distributed object cache can deliver high performance on small-file storage. Compared with the original system, the read of small files increased by a factor of 2.65～8.05, without write performance degradation.