Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Query processing in a system for distributed databases (SDD-1)
ACM Transactions on Database Systems (TODS)
Selectively estimation for Boolean queries
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Informed content delivery across adaptive overlay networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
R* Optimizer Validation and Performance Evaluation for Distributed Queries
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Object Fusion in Mediator Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Kademlia: A Peer-to-Peer Information System Based on the XOR Metric
IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Some complexity questions related to distributive computing(Preliminary Report)
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
The Piazza Peer Data Management System
IEEE Transactions on Knowledge and Data Engineering
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Keyword Search in DHT-Based Peer-to-Peer Networks
ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
KLEE: a framework for distributed top-k query algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimal peer selection for minimum-delay peer-to-peer streaming with rateless codes
Proceedings of the ACM workshop on Advances in peer-to-peer multimedia streaming
Approximately detecting duplicates for streaming data using stable bloom filters
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sharing aggregate computation for distributed queries
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Boosting topic-based publish-subscribe systems with dynamic clustering
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Staying FIT: efficient load shedding techniques for distributed stream processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
XML processing in DHT networks
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Survey of clustering algorithms
IEEE Transactions on Neural Networks
One is enough: distributed filtering for duplicate elimination
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In a variety of applications, ranging from data integration to distributed query evaluation, there is a need to obtain sets of data items from several sources (peers) and compute their union. As these sets often contain common data items, avoiding the transmission of redundant information is essential for effective union computation. In this paper we define the notion of optimal union plans for non-disjoint data sets residing on distinct peers, and present efficient algorithms for computing and executing such optimal plans. Our algorithms avoid redundant data transmission and optimally exploit the network bandwidth capabilities. A challenge in the design of optimal plans is the lack of a complete map of the distribution of the data items among peers. We analyze the information required for optimal planning and propose novel techniques to obtain compact, cheap to communicate, description of the data sources. We then exploit it for efficient union computation with reasonable accuracy. We demonstrate experimentally the superiority of our approach over the common naive union computation, showing it improves the performance by an order of magnitude.