Delay aware querying with seaweed

Authors:
Dushyanth Narayanan;Austin Donnelly;Richard Mortier;Antony Rowstron
Affiliations:
Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 27
Cited 13

Lazy updates for distributed search structure

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Replicated indexes for distributed data

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
RP*: A Family of Order Preserving Scalable Distributed Data Structures

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
The price of validity in dynamic networks

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Performance and Dependability of Structured Peer-to-Peer Overlays

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Mercury: supporting scalable multi-attribute range queries

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Fault-tolerance in the Borealis distributed stream processing system

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Enterprise information integration: successes, challenges and controversies

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Anemone: using end-systems as a rich network management platform

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
BATON: a balanced tree structure for peer-to-peer networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Indexing data-oriented overlay networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
High availability, scalable storage, dynamic peer networks: pick two

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Exploiting availability prediction in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Tuple routing strategies for distributed eddies

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Enhancing P2P file-sharing with an internet-scale query processor

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient indexing methods for probabilistic threshold queries over uncertain data

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Lifting the burden of history from adaptive query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

What happened to pastry

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
P2P systems with transactional semantics

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
On-demand view materialization and indexing for network forensic analysis

NETB'07 Proceedings of the 3rd USENIX international workshop on Networking meets databases
San Fermín: aggregating large data sets using a binomial swap forest

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Wide-scale data stream management

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Brighthouse: an analytic data warehouse for ad-hoc queries

Proceedings of the VLDB Endowment
Estimating the number of answers with guarantees for structured queries in p2p databases

Proceedings of the 17th ACM conference on Information and knowledge management
Approximating query completeness by predicting the number of answers in DHT-based web applications

Proceedings of the 10th ACM workshop on Web information and data management
The ORCHESTRA Collaborative Data Sharing System

ACM SIGMOD Record
Moara: flexible and scalable group-based querying system

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Efficient on-demand operations in dynamic distributed infrastructures

LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Network imprecision: a new consistency metric for scalable monitoring

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
ETTM: a scalable fault tolerant network manager

Proceedings of the 8th USENIX conference on Networked systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystem-based network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries.The challenges are scale (103 to 109 endsystems)and endsystem unavailability. In such large systems, a significant fraction of endsystems and their data will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable.We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data.Seaweed is a scalable query infrastructure supporting incremental results, online in-network aggregation and completeness prediction. It is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed's scalability against other approaches and also evaluate the Seaweed prototype running on a large-scale network simulator driven by real-world traces.