Efficient Approximate Query Processing in Peer-to-Peer Networks

Authors:
Benjamin Arai;Gautam Das;Dimitrios Gunopulos;Vana Kalogeraki
Affiliations:
IEEE;-;IEEE;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 25
Cited 6

Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A case for end system multicast (keynote address)

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
An adaptive peer-to-peer network for distributed caching of OLAP results

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Aqua: A Fast Decision Support Systems Using Approximate Query Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Building Low-Diameter P2P Networks

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Gossip-Based Computation of Aggregate Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Multi-dimensional range queries in sensor networks

Proceedings of the 1st international conference on Embedded networked sensor systems
A Peer-to-peer Framework for Caching Range Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Accurate, scalable in-network identification of p2p traffic using application signatures

Proceedings of the 13th international conference on World Wide Web
A bi-level Bernoulli scheme for database sampling

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Choosing a random peer

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Mercury: supporting scalable multi-attribute range queries

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Aggregate queries in peer-to-peer OLAP

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Exploiting locality for scalable information retrieval in peer-to-peer networks

Information Systems
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Clustering in peer-to-peer file sharing workloads

IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems

P2P OLAP: Data model, implementation and case study

Information Systems
Materialized view management in peer to peer environment

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Network-aware summarisation for resource discovery in P2P-content networks

Future Generation Computer Systems
OLAP query reformulation in peer-to-peer data warehousing

Information Systems
Secure Distributed Data Aggregation

Foundations and Trends in Databases
Self-adaptive approximate queries for large-scale information aggregation

International Journal of Web and Grid Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer-to-peer (P2P) databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large-scale ad hoc analysis queries, for example, aggregation queries, on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement, given the distributed and dynamic nature of P2P databases. In this paper, we present novel sampling-based techniques for approximate answering of ad hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors: the data is distributed (usually in uneven quantities) across many peers, within each peer, the data is often highly correlated, and, moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach based on random walks of the P2P graph, as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.