ACM Computing Surveys (CSUR)
R* optimizer validation and performance evaluation for local queries
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Join and Semijoin Algorithms for a Multiprocessor Database Machine
ACM Transactions on Database Systems (TODS)
Query processing in a system for distributed databases (SDD-1)
ACM Transactions on Database Systems (TODS)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
IEEE/ACM Transactions on Networking (TON)
Real-time memory efficient data redundancy removal algorithm
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Cardinality estimation and dynamic length adaptation for Bloom filters
Distributed and Parallel Databases
High throughput data redundancy removal algorithm with scalable performance
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Real-time approximate Range Motif discovery & data redundancy removal algorithm
Proceedings of the 14th International Conference on Extending Database Technology
Probabilistic threshold join over distributed uncertain data
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Proceedings of the 15th International Conference on Extending Database Technology
Join processing using Bloom filter in MapReduce
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Toward intersection filter-based optimization for joins in MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Streaming quotient filter: a near optimal approximate duplicate detection approach for data streams
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Distributed joins have gained importance in the past decade, mainly due to the increased number of available data sources on the Internet. In this work we extend Bloomjoin, the state of the art algorithm for distributed joins, so that it minimizes the network usage for the query execution based on database statistics. We present 4 extensions of the algorithm, and construct a query optimizer for selecting the best extension for each query. Our theoretical analysis and experimental evaluation shows significant network cost savings compared to the original Bloomjoin algorithm.