ALVIS peers: a scalable full-text peer-to-peer retrieval engine
P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Data allocation scheme based on term weight for P2P information retrieval
Proceedings of the 9th annual ACM international workshop on Web information and data management
Distinct value estimation on peer-to-peer networks
Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
Efficiently Handling Dynamics in Distributed Link Based Authority Analysis
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
ACM Transactions on Computer Systems (TOCS)
Statistical structures for Internet-scale data management
The VLDB Journal — The International Journal on Very Large Data Bases
PINTS: peer-to-peer infrastructure for tagging systems
IPTPS'08 Proceedings of the 7th international conference on Peer-to-peer systems
Hi-index | 0.00 |
Counting in general, and estimating the cardinality of (multi-) sets in particular, is highly desirable for a large variety of applications, representing a foundational block for the efficient deployment and access of emerging internetscale information systems. Examples of such applications range from optimizing query access plans in internet-scale databases, to evaluating the significance (rank/score) of various data items in information retrieval applications. The key constraints that any acceptable solution must satisfy are: (i) efficiency: the number of nodes that need be contacted for counting purposes must be small in order to enjoy small latency and bandwidth requirements; (ii) scalability, seemingly contradicting the efficiency goal: arbitrarily large numbers of nodes nay need to add elements to a (multi-) set, which dictates the need for a highly distributed solution, avoiding server-based scalability, bottleneck, and availability problems; (iii) access and storage load balancing: counting and related overhead chores should be distributed fairly to the nodes of the network; (iv) accuracy: tunable, robust (in the presence of dynamics and failures) and highly accurate cardinality estimation; (v) simplicity and ease of integration: special, solution-specific indexing structures should be avoided. In this paper, first we contribute a highly-distributed, scalable, efficient, and accurate (multi-) set cardinality estimator. Subsequently, we show how to use our solution to build and maintain histograms, which have been a basic building block for query optimization for centralized databases, facilitating their porting into the realm of internet-scale data networks.