SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Database (2nd ed.): principles, programming, and performance
Database (2nd ed.): principles, programming, and performance
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
stdchk: A Checkpoint Storage System for Desktop Grid Computing
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Abstract storage: moving file format-specific abstractions intopetabyte-scale storage systems
Proceedings of the second international workshop on Data-aware distributed computing
International Journal of High Performance Computing Applications
PLFS: a checkpoint filesystem for parallel applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Adaptive and scalable metadata management to support a trillion files
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hierarchical file systems are dead
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Optimization Techniques at the I/O Forwarding Layer
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Making a case for distributed file systems at Exascale
Proceedings of the third international workshop on Large-scale system and application performance
Pantheon: exascale file system search for scientific computing
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Open problems in network-aware data management in exa-scale computing and terabit networking era
Proceedings of the first international workshop on Network-aware data management
SDS: a framework for scientific data services
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Hi-index | 0.00 |
As high-performance computing approaches exascale, the existing I/O system design is having trouble keeping pace in both performance and scalability. We propose to address this challenge by adopting database principles and techniques in parallel I/O systems. First, we propose to adopt an array data model because many scientific applications represent their data in arrays. This strategy follows a cardinal principle from database research, which separates the logical view from the physical layout of data. This high-level data model gives the underlying implementation more freedom to optimize the physical layout and to choose the most effective way of accessing the data. For example, knowing that a set of write operations is working on a single multi-dimensional array makes it possible to keep the subarrays in a log structure during the write operations and reassemble them later into another physical layout as resources permit. While maintaining the high-level view, the storage system could compress the user data to reduce the physical storage requirement, collocate data records that are frequently used together, or replicate data to increase availability and fault-tolerance. Additionally, the system could generate secondary data structures such as database indexes and summary statistics. We expect the proposed Scientific Data Services approach to create a "live" storage system that dynamically adjusts to user demands and evolves with the massively parallel storage hardware.