The placement optimization program: a practical solution to the disk file assignment problem
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A calculus of variations approach to file allocation problems in computer systems
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time
IEEE Transactions on Computers
Comparative Models of the File Assignment Problem
ACM Computing Surveys (CSUR)
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Minerva: An automated resource provisioning tool for large-scale storage systems
ACM Transactions on Computer Systems (TOCS)
Computer Performance Modeling Handbook
Computer Performance Modeling Handbook
Introduction to Linear Optimization
Introduction to Linear Optimization
Allocating Data and Operations to Nodes in Distributed Database Design
IEEE Transactions on Knowledge and Data Engineering
Parameter interdependencies of file placement models in a Unix system
SIGMETRICS '84 Proceedings of the 1984 ACM SIGMETRICS conference on Measurement and modeling of computer systems
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The 8 requirements of real-time stream processing
ACM SIGMOD Record
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Position: short object lifetimes require a delete-optimized storage system
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Multi-site cooperative data stream analysis
ACM SIGOPS Operating Systems Review
Adaptive Control of Extreme-scale Stream Processing Systems
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Ursa minor: versatile cluster-based storage
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Explicit control a batch-aware distributed file system
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Towards Autonomic Fault Recovery in System-S
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Autonomic operations in cooperative stream processing systems
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Synergy: sharing-aware component composition for distributed stream processing systems
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Time-varying management of data storage
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Advances and Challenges for Scalable Provenance in Stream Processing Systems
Provenance and Annotation of Data and Processes
SODA: an optimizing scheduler for large-scale stream-based distributed computer systems
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
COLA: optimizing stream processing applications via graph partitioning
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
COLA: optimizing stream processing applications via graph partitioning
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Hi-index | 0.00 |
We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, both incoming data and intermediate results may need to be stored to enable analyses at unknown future times. The quantity of data of potential use would dominate even the largest storage system. Thus, a mechanism is needed to keep the data most likely to be used. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time in a prespecified way [Douglis et al.2004]. Storage space for data entering the system is reclaimed automatically by deleting data of the lowest current value. In such large systems, there will naturally be multiple file systems available, each with different properties. Choosing the right file system for a given incoming stream of data presents a challenge. In this article we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of highest overall value, while simultaneously balancing the read load to the file system. The key aspects of such a scheme are quite different from those that arise in traditional file assignment problems. We further motivate this optimization problem and describe a solution, comparing its performance to other reasonable schemes via simulation experiments.