Metadata-Aware small files storage architecture on hadoop

  • Authors:
  • Xiaoyong Zhao;Yang Yang;Li-li Sun;Han Huang

  • Affiliations:
  • School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China;School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China

  • Venue:
  • WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ZB (trillion GB) scales of data produced globally each year, making the distributed data storage become a trend. Research and application on Hadoop which is the most representative open source distributed file system is increasing. However, Hadoop is not suitable for handling massive small files, this paper presents a metadata-aware storage architecture for massive small files, taking full advantage of the metadata of file, merging the small files into Sequence File by the classification algorithm of merge module, and the efficient indexing mechanism be introduced, make a good solution to the problem about the bottleneck of NameNode memory. Taking MP3 files as an example, the experiments show that the architecture can obtain good results.