Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online prediction algorithms for databases and operating systems
Online prediction algorithms for databases and operating systems
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area Web cache sharing protocol
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A comparison of selectivity estimators for range queries on metric attributes
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Counting large numbers of events in small registers
Communications of the ACM
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Design and evaluation of a wide-area event notification service
ACM Transactions on Computer Systems (TOCS)
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Automating Statistics Management for Query Optimizers
IEEE Transactions on Knowledge and Data Engineering
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Kademlia: A Peer-to-Peer Information System Based on the XOR Metric
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
ACM Transactions on Computer Systems (TOCS)
Anthill: A Framework for the Development of Agent-Based Peer-to-Peer Systems
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
The impact of DHT routing geometry on resilience and proximity
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Routing networks for distributed hash tables
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Measurement, modeling, and analysis of a peer-to-peer file-sharing workload
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Subscription Summarization: A New Paradigm for Efficient Publish/Subscribe Systems
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Epidemic-Style Proactive Aggregation in Large Overlay Networks
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
Approximate Aggregation Techniques for Sensor Databases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Spatio-Temporal Aggregation Using Sketches
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
The price of validity in dynamic networks
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Robust Aggregation Protocols for Large-Scale Overlay Networks
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Mercury: supporting scalable multi-attribute range queries
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space efficient mining of multigraph streams
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
KLEE: a framework for distributed top-k query algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scientific data management in the coming decade
ACM SIGMOD Record
Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science)
Peer-to-Peer Systems and Applications (Lecture Notes in Computer Science)
Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Optimized stratified sampling for approximate query processing
ACM Transactions on Database Systems (TODS)
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Designing a DHT for low latency and high throughput
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient peer-to-peer keyword searching
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability.