Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A case for end system multicast (keynote address)
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A robust, optimization-based approach for approximate answering of aggregate queries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
An adaptive peer-to-peer network for distributed caching of OLAP results
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
Aqua: A Fast Decision Support Systems Using Approximate Query Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Dynamic sample selection for approximate query processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Building Low-Diameter P2P Networks
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Multi-dimensional range queries in sensor networks
Proceedings of the 1st international conference on Embedded networked sensor systems
A Peer-to-peer Framework for Caching Range Queries
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
A bi-level Bernoulli scheme for database sampling
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Mercury: supporting scalable multi-attribute range queries
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Aggregate queries in peer-to-peer OLAP
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Exploiting locality for scalable information retrieval in peer-to-peer networks
Information Systems
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Clustering in peer-to-peer file sharing workloads
IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems
P2P OLAP: Data model, implementation and case study
Information Systems
Materialized view management in peer to peer environment
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Network-aware summarisation for resource discovery in P2P-content networks
Future Generation Computer Systems
OLAP query reformulation in peer-to-peer data warehousing
Information Systems
Secure Distributed Data Aggregation
Foundations and Trends in Databases
Self-adaptive approximate queries for large-scale information aggregation
International Journal of Web and Grid Services
Hi-index | 0.00 |
Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we present novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach based on random walks of the P2P graph, as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.