Lazy updates for distributed search structure
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Replicated indexes for distributed data
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
RP*: A Family of Order Preserving Scalable Distributed Data Structures
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
ACM Transactions on Computer Systems (TOCS)
The price of validity in dynamic networks
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Performance and Dependability of Structured Peer-to-Peer Overlays
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Mercury: supporting scalable multi-attribute range queries
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Enterprise information integration: successes, challenges and controversies
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Anemone: using end-systems as a rich network management platform
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
BATON: a balanced tree structure for peer-to-peer networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Indexing data-oriented overlay networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
High availability, scalable storage, dynamic peer networks: pick two
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Exploiting availability prediction in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Tuple routing strategies for distributed eddies
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Enhancing P2P file-sharing with an internet-scale query processor
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient indexing methods for probabilistic threshold queries over uncertain data
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Lifting the burden of history from adaptive query processing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
P2P systems with transactional semantics
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
On-demand view materialization and indexing for network forensic analysis
NETB'07 Proceedings of the 3rd USENIX international workshop on Networking meets databases
San Fermín: aggregating large data sets using a binomial swap forest
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Wide-scale data stream management
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Estimating the number of answers with guarantees for structured queries in p2p databases
Proceedings of the 17th ACM conference on Information and knowledge management
Approximating query completeness by predicting the number of answers in DHT-based web applications
Proceedings of the 10th ACM workshop on Web information and data management
The ORCHESTRA Collaborative Data Sharing System
ACM SIGMOD Record
Moara: flexible and scalable group-based querying system
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Efficient on-demand operations in dynamic distributed infrastructures
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Network imprecision: a new consistency metric for scalable monitoring
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
ETTM: a scalable fault tolerant network manager
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Hi-index | 0.00 |
Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystem-based network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries.The challenges are scale (103 to 109 endsystems)and endsystem unavailability. In such large systems, a significant fraction of endsystems and their data will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable.We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data.Seaweed is a scalable query infrastructure supporting incremental results, online in-network aggregation and completeness prediction. It is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed's scalability against other approaches and also evaluate the Seaweed prototype running on a large-scale network simulator driven by real-world traces.