LazyBase: freshness vs. performance in information management

Authors:
Kimberly Keeton;Charles B. Morrey, III;Craig A.N. Soules;Alistair Veitch
Affiliations:
Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories
Venue:
ACM SIGOPS Operating Systems Review
Year:
2010

Citing 8
Cited 4

Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Differential files: their application to the maintenance of large databases

ACM Transactions on Database Systems (TODS)
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Indexing time vs. query time: trade-offs in dynamic information retrieval systems

Proceedings of the 14th ACM international conference on Information and knowledge management
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
DataSeries: an efficient, flexible data format for structured serial data

ACM SIGOPS Operating Systems Review
SCAN-Lite: enterprise-wide analysis on the cheap

Proceedings of the 4th ACM European conference on Computer systems

Don't thrash: how to cache your hash on flash

HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Don't thrash: how to cache your hash on flash

Proceedings of the VLDB Endowment
An architecture framework for application-managed scaling of cloud-hosted relational databases

Proceedings of the WICSA/ECSA 2012 Companion Volume
Scaling Memcache at Facebook

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information management applications exhibit a wide range of query performance and result freshness goals. Some applications, such as web search, require interactive performance, but may safely operate on stale data. Others, such as policy violation detection, require up-to-date results, but can tolerate relaxed performance goals. Furthermore, information processing applications must be able to ingest updates at the scale of an entire organization. In this paper, we present LazyBase, a system that allows users to trade off query performance and result freshness in order to satisfy the full range of users' goals. LazyBase breaks up data ingestion into a pipeline of operations to minimize ingest time and uses models of processing and query performance to execute user queries. Initial results with LazyBase illustrate the feasibility of the pipelined model, highlight a rich space of trade-offs between result freshness and query performance, and often outperform existing solutions in the space.