Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Differential files: their application to the maintenance of large databases
ACM Transactions on Database Systems (TODS)
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Indexing time vs. query time: trade-offs in dynamic information retrieval systems
Proceedings of the 14th ACM international conference on Information and knowledge management
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
DataSeries: an efficient, flexible data format for structured serial data
ACM SIGOPS Operating Systems Review
SCAN-Lite: enterprise-wide analysis on the cheap
Proceedings of the 4th ACM European conference on Computer systems
Don't thrash: how to cache your hash on flash
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Don't thrash: how to cache your hash on flash
Proceedings of the VLDB Endowment
An architecture framework for application-managed scaling of cloud-hosted relational databases
Proceedings of the WICSA/ECSA 2012 Companion Volume
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Information management applications exhibit a wide range of query performance and result freshness goals. Some applications, such as web search, require interactive performance, but may safely operate on stale data. Others, such as policy violation detection, require up-to-date results, but can tolerate relaxed performance goals. Furthermore, information processing applications must be able to ingest updates at the scale of an entire organization. In this paper, we present LazyBase, a system that allows users to trade off query performance and result freshness in order to satisfy the full range of users' goals. LazyBase breaks up data ingestion into a pipeline of operations to minimize ingest time and uses models of processing and query performance to execute user queries. Initial results with LazyBase illustrate the feasibility of the pipelined model, highlight a rich space of trade-offs between result freshness and query performance, and often outperform existing solutions in the space.