Probabilistic threshold join over distributed uncertain data

  • Authors:
  • Lei Deng;Fei Wang;Benxiong Huang

  • Affiliations:
  • Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, China;Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, China;Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, China

  • Venue:
  • WAIM'11 Proceedings of the 12th international conference on Web-age information management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large amount of uncertain data is collected by many emerging applications which contain multiple sources in a distributed manner. Previous efforts on querying uncertain data in distributed environment have only focus on ranking and skyline, join queries have not been addressed in earlier work despite their importance in databases. In this paper, we address distributed probabilistic threshold join query, which retrieves results satisfying the join condition with combining probabilities that meet the threshold requirement from distributed sites. We propose a new kind of bloom filters called Probability Bloom Filters (PBF) to represent set with probabilistic attribute and design a PBF based Bloomjoin algorithm for executing distributed probabilistic threshold join query with communication efficiency. Furthermore, we provide theoretical analysis of the network cost of our algorithm and demonstrate it by simulation. The experiment results show that our algorithm can save network cost efficiently by comparing to original Bloomjoin algorithm in most scenarios.