GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
Timestamp-based approach for the detection and resolution of mutual conflicts in distributed systems
DEXA '97 Proceedings of the 8th International Workshop on Database and Expert Systems Applications
Dynamic Metadata Management for Petabyte-Scale File Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scaling parallel I/O performance through I/O delegate and caching system
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Managing Groups of Files in a Rule Oriented Data Management System (iRODS)
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Semantic Enabled Metadata Framework for Data Grids
CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
A new paradigm: Data-aware scheduling in grid computing
Future Generation Computer Systems
Efficient access to many samall files in a filesystem for grid computing
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Benefits of high speed interconnects to cluster file systems: a case study with lustre
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Modern collaborative science has placed increasing burden on data management infrastructure to handle the increasingly large data archives generated. Beside functionality, reliability and availability are also key factors in delivering a data management system that can efficiently and effectively meet the challenges posed and compounded by the unbounded increase in the size of data generated by scientific applications. We have developed a reliable and efficient distributed data storage system, PetaShare, which spans multiple institutions across the state of Louisiana. At the back-end, PetaShare provides a unified name space and efficient data movement across geographically distributed storage sites. At the front-end, it provides light-weight clients the enable easy, transparent and scalable access. In PetaShare, we have designed and implemented an asynchronously replicated multi-master metadata system for enhanced reliability and availability, and an advanced buffering system for improved data transfer performance. In this paper, we present the details of our design and implementation, show performance results, and describe our experience in developing a reliable and efficient distributed data management system for data-intensive science.