PetaShare: A reliable, efficient and transparent distributed storage management system

Authors:
Tevfik Kosar;Ismail Akturk;Mehmet Balman;Xinqi Wang
Affiliations:
(Correspd. E-mail: tkosar@buffalo.edu) Dept. of Comp. Sci. and Eng., State Univ. of New York, Buffalo, NY, USA and Dept. of Comp. Sci., Louisiana State Univ., Baton Rouge, LA, USA and Center for C ...;Department of Computer Engineering, Bilkent University, Ankara, Turkey;Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA;Department of Computer Science and Engineering, State University of New York, Buffalo, NY, USA and Department of Computer Science, Louisiana State University, Baton Rouge, LA, USA
Venue:
Scientific Programming
Year:
2011

Citing 13
Cited 0

GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Timestamp-based approach for the detection and resolution of mutual conflicts in distributed systems

DEXA '97 Proceedings of the 8th International Workshop on Database and Expert Systems Applications
Dynamic Metadata Management for Petabyte-Scale File Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Using server-to-server communication in parallel file systems to simplify consistency and improve performance

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Managing Groups of Files in a Rule Oriented Data Management System (iRODS)

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Semantic Enabled Metadata Framework for Data Grids

CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
A new paradigm: Data-aware scheduling in grid computing

Future Generation Computer Systems
Efficient access to many samall files in a filesystem for grid computing

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Benefits of high speed interconnects to cluster file systems: a case study with lustre

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern collaborative science has placed increasing burden on data management infrastructure to handle the increasingly large data archives generated. Beside functionality, reliability and availability are also key factors in delivering a data management system that can efficiently and effectively meet the challenges posed and compounded by the unbounded increase in the size of data generated by scientific applications. We have developed a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent and scalable access. In PetaShare, we have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability, and an advanced buffering system for improved data transfer performance. In this paper, we present the details of our design and implementation, show performance results, and describe our experience in developing a reliable and efficient distributed data management system for data-intensive science.