Distinct value estimation on peer-to-peer networks

Authors:
Zubin Joseph;Gautam Das;Leonidas Fegaras
Affiliations:
UT Arlington;UT Arlington;UT Arlington
Venue:
Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
Year:
2008

Citing 24
Cited 0

Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Error-constrained COUNT query evaluation in relational databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Search and replication in unstructured peer-to-peer networks

ICS '02 Proceedings of the 16th international conference on Supercomputing
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Approximating Aggregate Queries about Web Pages via Random Walks

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
[15] Peer-to-Peer Architecture Case Study: Gnutella Network

P2P '01 Proceedings of the First International Conference on Peer-to-Peer Computing
Gossip-Based Computation of Aggregate Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Effective use of block-level sampling in statistics estimation

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Choosing a random peer

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
A survey of peer-to-peer content distribution technologies

ACM Computing Surveys (CSUR)
Distributed Uniform Sampling in Unstructured Peer-to-Peer Networks

HICSS '06 Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 09
Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Random sampling from a search engine's index

Proceedings of the 15th international conference on World Wide Web
Random walks in peer-to-peer networks: algorithms and evaluation

Performance Evaluation - P2P computing systems
Random walk based node sampling in self-organizing networks

ACM SIGOPS Operating Systems Review
On unbiased sampling for unstructured peer-to-peer networks

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Online balancing of range-partitioned data with applications to peer-to-peer systems

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Clustering in peer-to-peer file sharing workloads

IPTPS'04 Proceedings of the Third international conference on Peer-to-Peer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Peer-to-Peer networks have become very popular on the Internet, with millions of peers all over the world sharing large volumes of data. In the assistive healthcare sector, it is likely that P2P networks will develop that interconnect and allow the controlled sharing of patient databases of various hospitals, clinics, and research laboratories. However, the sheer scale of these networks has made it difficult to gather statistics that could be used for building new features. In this paper, we present a technique to obtain estimations of the number of distinct values matching a query on the network. We evaluate the technique experimentally and provide a set of results that demonstrate its effectiveness, as well as its flexibility in supporting a variety of queries and applications.