Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Computer Systems (TOCS)
Deciding when to forget in the Elephant file system
Proceedings of the seventeenth ACM symposium on Operating systems principles
A grid-enabled MPI: message passing in heterogeneous distributed computing systems
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Data management and transfer in high-performance computational grid environments
Parallel Computing - Parallel data-intensive algorithms and applications
Chord: a scalable peer-to-peer lookup protocol for internet applications
IEEE/ACM Transactions on Networking (TON)
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
The many faces of publish/subscribe
ACM Computing Surveys (CSUR)
Grid Datafarm Architecture for Petascale Data Intensive Computing
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
OpenDHT: a public DHT service and its uses
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed
International Journal of High Performance Computing Applications
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Towards efficient search on unstructured data: an intelligent-storage approach
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
B-trees, shadowing, and clones
ACM Transactions on Storage (TOS)
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A Taxonomy and Survey on Distributed File Systems
NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 01
The XtreemFS architecture—a case for object-based file systems in Grids
Concurrency and Computation: Practice & Experience - Selection of Best Papers of the VLDB Data Management in Grids Workshop (VLDB DMG 2007)
A break in the clouds: towards a cloud definition
ACM SIGCOMM Computer Communication Review
GridNFS: global storage for global collaborations
LGDI '05 Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology
A Range Query Model Based on DHT in P2P System
NSWCTC '09 Proceedings of the 2009 International Conference on Networks Security, Wireless Communications and Trusted Computing - Volume 01
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency
Proceedings of the 2009 EDBT/ICDT Workshops
Future Generation Computer Systems
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Optimizing intermediate data management in MapReduce computations
Proceedings of the First International Workshop on Cloud Computing Platforms
Going back and forth: efficient multideployment and multisnapshotting on clouds
Proceedings of the 20th international symposium on High performance distributed computing
On the benefits of transparent compression for cost-effective cloud data storage
Transactions on large-scale data- and knowledge-centered systems III
Optimizing multi-deployment on clouds by means of self-adaptive prefetching
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Bringing introspection into BlobSeer: Towards a self-adaptive distributed data management system
International Journal of Applied Mathematics and Computer Science - SPECIAL SECTION: Efficient Resource Management for Grid-Enabled Applications
Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Towards scalable array-oriented active storage: the pyramid approach
ACM SIGOPS Operating Systems Review
A hybrid local storage transfer scheme for live migration of I/O intensive workloads
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
TomusBlobs: Towards Communication-Efficient Storage for MapReduce Applications in Azure
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Snooze: A Scalable and Autonomic Virtual Machine Management Framework for Private Clouds
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Riding Out the Storm: How to Deal with the Complexity of Grid and Cloud Management
Journal of Grid Computing
Scalable Reed-Solomon-based reliable local storage for HPC applications on iaas clouds
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Towards a Generic Security Framework for Cloud Data Management Environments
International Journal of Distributed Systems and Technologies
Evaluating cloud storage services for tightly-coupled applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds
Journal of Parallel and Distributed Computing
Transactions on Edutainment IX
Hi-index | 0.00 |
As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale computing and beyond, introduces additional issues for which scalable data management becomes an immediate need. This paper makes several contributions. First, it proposes a set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency. In particular, we highlight the potentially large benefits of using versioning in this context. Second, based on these principles, we propose a set of versioning algorithms, both for data and metadata, that enable a high throughput under concurrency. Finally, we implement and evaluate these algorithms in the BlobSeer prototype, that we integrate as a storage backend in the Hadoop MapReduce framework. We perform extensive microbenchmarks as well as experiments with real MapReduce applications: they demonstrate that applying the principles defended in our approach brings substantial benefits to data intensive applications.