Metadata-Aware small files storage architecture on hadoop

Authors:
Xiaoyong Zhao;Yang Yang;Li-li Sun;Han Huang
Affiliations:
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Venue:
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Year:
2012

Citing 2
Cited 0

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ZB (trillion GB) scales of data produced globally each year, making the distributed data storage become a trend. Research and application on Hadoop which is the most representative open source distributed file system is increasing. However, Hadoop is not suitable for handling massive small files, this paper presents a metadata-aware storage architecture for massive small files, taking full advantage of the metadata of file, merging the small files into Sequence File by the classification algorithm of merge module, and the efficient indexing mechanism be introduced, make a good solution to the problem about the bottleneck of NameNode memory. Taking MP3 files as an example, the experiments show that the architecture can obtain good results.