SDS: a framework for scientific data services

Authors:
Bin Dong;Surendra Byna;Kesheng Wu
Affiliations:
Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA
Venue:
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Year:
2013

Citing 11
Cited 0

File Assignment in Parallel I/O Systems with Minimal Variance of Service Time

IEEE Transactions on Computers
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
A file assignment strategy independent of workload characteristic assumptions

ACM Transactions on Storage (TOS)
Data layout optimization for petascale file systems

Proceedings of the 4th Annual Workshop on Petascale Data Storage
Overview of sciDB: large scale array storage, processing and analysis

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A Balanced Allocation Strategy for File Assignment in Parallel I/O Systems

NAS '10 Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage
FastQuery: A Parallel Indexing System for Scientific Data

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Scientific data services: a high-performance I/O system with array semantics

Proceedings of the first annual workshop on High performance computing meets databases
Parallel I/O, analysis, and visualization of a trillion particle simulation

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Taming parallel I/O complexity with auto-tuning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read calls to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls using the HDF5 Virtual Object Layer (VOL) and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.