BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency

Authors:
Bogdan Nicolae;Gabriel Antoniu;Luc Bougé
Affiliations:
University of Rennes 1, IRISA, Rennes, France;INRIA, Centre Rennes - Bretagne Atlantique, IRISA, Rennes, France;ENS Cachan/Brittany, IRISA, France
Venue:
Proceedings of the 2009 EDBT/ICDT Workshops
Year:
2009

Citing 11
Cited 4

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
FAB: building distributed enterprise disk arrays from commodity components

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Deep Store: An Archival Storage System Architecture

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Grid'5000: A Large Scale and Highly Reconfigurable Grid Experimental Testbed

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
A High Throughput Atomic Storage Algorithm

ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Large data methods for multimedia

Proceedings of the 15th international conference on Multimedia
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Towards efficient search on unstructured data: an intelligent-storage approach

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Integrating parallel file systems with object-based storage devices

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Object-based storage

IEEE Communications Magazine

Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Improving the Hadoop map/reduce framework to support concurrent appends through the BlobSeer BLOB management system

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
High throughput data-compression for cloud storage

Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
BlobSeer: Next-generation data management for large scale infrastructures

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To accommodate the needs of large-scale distributed P2P systems, scalable data management strategies are required, allowing applications to efficiently cope with continuously growing, highly distributed data. This paper addresses the problem of efficiently storing and accessing very large binary data objects (blobs). It proposes an efficient versioning scheme allowing a large number of clients to concurrently read, write and append data to huge blobs that are fragmented and distributed at a very large scale. Scalability under heavy concurrency is achieved thanks to an original metadata scheme, based on a distributed segment tree built on top of a Distributed Hash Table (DHT). Our approach has been implemented and experimented within our BlobSeer prototype on the Grid'5000 testbed, using up to 175 nodes.