Scale and performance in a distributed file system
ACM Transactions on Computer Systems (TOCS)
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Integrating parallel file I/O and database support for high-performance scientific data management
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
SSD '93 Proceedings of the Third International Symposium on Advances in Spatial Databases
The SDSC storage resource broker
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Grid Computing: Making the Global Infrastructure a Reality
Grid Computing: Making the Global Infrastructure a Reality
Scientific data repositories: designing for a moving target
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
SimpleDB: a simple java-based multiuser syst for teaching database internals
Proceedings of the 38th SIGCSE technical symposium on Computer science education
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Kinesis: A new approach to replica placement in distributed storage systems
ACM Transactions on Storage (TOS)
BaseX & DeepFS joint storage for filesystem and database
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
ROARS: a scalable repository for data intensive scientific computing
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing
Special issue for data intensive eScience
Distributed and Parallel Databases
Hi-index | 0.00 |
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, scalable storage and efficient rich metadata queries for scientific applications. In this paper, we present the design and implementation of ROARS, focusing primarily on the challenge of maintaining data integrity across long time scales. We evaluate the performance of ROARS on a storage cluster, comparing to the Hadoop distributed file system and a centralized file server. We observe that ROARS has read and write performance that scales with the number of storage nodes, and integrity checking that scales with the size of the largest node. We demonstrate the ability of ROARS to function correctly through multiple system failures and reconfigurations. ROARS has been in production use for over three years as the primary data repository for a biometrics research lab at the University of Notre Dame.