HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems

Authors:
Yifeng Zhu;Hong Jiang;Jun Wang;Feng Xian
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2008

Citing 0
Cited 8

SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Adaptive and scalable metadata management to support a trillion files

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
BF-chord: an improved lookup protocol to chord based on Bloom Filter for wireless P2P

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Just-in-time analytics on large file systems

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
CEFLS: A Cost-Effective File Lookup Service in a Distributed Metadata File System

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A fast indexing algorithm optimization with user behavior pattern

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Two-level Hash/Table approach for metadata management in distributed file systems

The Journal of Supercomputing
Direct lookup and hash-based metadata placement for local file systems

Proceedings of the 6th International Systems and Storage Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called HBA (Hierarchical Bloom filter Arrays) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, Bloom filter arrays, with different level of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and an implementation in Linux. Simulation results show our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or super-clusters) and with the amount of data in the Peta-byte scale or higher. Our implementation indicates that HBA can reduce metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when the system is configured with 16 metadata servers.