Adaptive and scalable metadata management to support a trillion files

Authors:
Jing Xing;Jin Xiong;Ninghui Sun;Jie Ma
Affiliations:
Chinese Academy of Sciences and Graduate University of Chinese Academy of Sciences;Chinese Academy of Sciences;Chinese Academy of Sciences;Chinese Academy of Sciences
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 15
Cited 5

The Sprite Network Operating System

Computer
The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
RAMA: an easy-to-use, high-performance parallel file system

Parallel Computing - Special double issue: parallel I/O
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
zFS " A Scalable Distributed File System Using Object Disks

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dynamic Metadata Management for Petabyte-Scale File Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
A directory index for Ext2

ALS '01 Proceedings of the 5th annual Linux Showcase & Conference - Volume 5
Embedded inodes and explicit grouping: exploiting disk bandwidth for small files

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Distributed directory service in the Farsite file system

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems

IEEE Transactions on Parallel and Distributed Systems
GIGA+: scalable directories for shared file systems

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems

Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Catalogue manager for metadata dissemination in the NetTraveler middleware system

International Journal of Intelligent Information and Database Systems
Scientific data services: a high-performance I/O system with array semantics

Proceedings of the first annual workshop on High performance computing meets databases
An approach for indexing file names in a directory

Proceedings of the 13th International Conference on Computer Systems and Technologies
Two-level Hash/Table approach for metadata management in distributed file systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays more and more applications require file systems to efficiently maintain million or more files. How to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems. Limited by static directory structures, existing file systems will be prohibitively inefficient for this use. To address this problem, we present a scalable and adaptive metadata management system which aims to maintain a trillion files efficiently. Firstly, our system exploits an adaptive two-level directory partitioning based on extendible hashing to manage very large directories. Secondly, our system utilizes fine-grained parallel processing within a directory and greatly improves performance of file creation or deletion. Thirdly, our system uses multiple-layered metadata cache management which improves memory utilization on the servers. And finally, our system uses a dynamic loadbalance mechanism based on consistent hashing which enables our system to scale up and down easily. Our performance results on 32 metadata servers show that our user-level prototype implementation can create more than 74 thousand files per second and can get more than 270 thousand files' attributes per second in a single directory with 100 million files. Moreover, it delivers a peak throughput of more than 60 thousand file creates/second in a single directory with 1 billion files.