The Vesta parallel file system
ACM Transactions on Computer Systems (TOCS)
A case for intelligent disks (IDISKs)
ACM SIGMOD Record
Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Evolving RPC for active storage
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Active Storage for Large-Scale Data Mining and Multimedia
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Distributed Computing with Load-Managed Active Storage
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the 1st USENIX Conference on File and Storage Technologies
Boxwood: abstractions as the foundation for storage infrastructure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
PVFS: a parallel file system for linux clusters
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Ceph: a scalable, high-performance distributed file system
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Efficient guaranteed disk request scheduling with fahrrad
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Evaluation of active storage strategies for the lustre parallel file system
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable performance of the Panasas parallel file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
End-to-end performance management for scalable distributed storage
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS)
CLADE '08 Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
RTAS '08 Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications Symposium
Scientific data services: a high-performance I/O system with array semantics
Proceedings of the first annual workshop on High performance computing meets databases
Hi-index | 0.00 |
High-end computing is increasingly I/O bound as computations become more data-intensive, and data transport technologies struggle to keep pace with the demands of large-scale, distributed computations. One approach to avoiding unnecessary I/O is to move the processing to the data, as seen in Google's successful, but relatively specialized, MapReduce system. This paper discusses our investigation towards a general solution for enabling in-situ computation in a peta-scale storage system. We believe our work with flexible, application-specific structured storage is the key to addressing the I/O overhead caused by data partitioning across storage nodes. In order to manage competing workloads on storage nodes, our research in system performance management is leveraged. Our ultimate goal is a general framework for in-situ data-intensive processing, indexing, and searching, which we expect to provide orders of magnitude performance increases for data-intensive workloads.