Abstract storage: moving file format-specific abstractions intopetabyte-scale storage systems

Authors:
Joe B. Buck;Noah Watkins;Carlos Maltzahn;Scott A. Brandt
Affiliations:
University of California Santa Cruz, Santa Cruz, CA, USA;University of California Santa Cruz, Santa Cruz, CA, USA;University of California Santa Cruz, Santa Cruz, CA, USA;University of California Santa Cruz, Santa Cruz, CA, USA
Venue:
Proceedings of the second international workshop on Data-aware distributed computing
Year:
2009

Citing 19
Cited 1

The Vesta parallel file system

ACM Transactions on Computer Systems (TOCS)
A case for intelligent disks (IDISKs)

ACM SIGMOD Record
Active disks: programming model, algorithms and evaluation

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Evolving RPC for active storage

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Active Storage for Large-Scale Data Mining and Multimedia

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Distributed Computing with Load-Managed Active Storage

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Boxwood: abstractions as the foundation for storage infrastructure

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Efficient guaranteed disk request scheduling with fahrrad

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Evaluation of active storage strategies for the lustre parallel file system

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable performance of the Panasas parallel file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
End-to-end performance management for scalable distributed storage

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)

CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
Virtualizing Disk Performance

RTAS '08 Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications Symposium

Scientific data services: a high-performance I/O system with array semantics

Proceedings of the first annual workshop on High performance computing meets databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-end computing is increasingly I/O bound as computations become more data-intensive, and data transport technologies struggle to keep pace with the demands of large-scale, distributed computations. One approach to avoiding unnecessary I/O is to move the processing to the data, as seen in Google's successful, but relatively specialized, MapReduce system. This paper discusses our investigation towards a general solution for enabling in-situ computation in a peta-scale storage system. We believe our work with flexible, application-specific structured storage is the key to addressing the I/O overhead caused by data partitioning across storage nodes. In order to manage competing workloads on storage nodes, our research in system performance management is leveraged. Our ultimate goal is a general framework for in-situ data-intensive processing, indexing, and searching, which we expect to provide orders of magnitude performance increases for data-intensive workloads.