Efficient processing of probabilistic set-containment queries on uncertain set-valued data

  • Authors:
  • Xiaolong Zhang;Ke Chen;Lidan Shou;Gang Chen;Yuan Gao;Kian-Lee Tan

  • Affiliations:
  • College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China;School of Computing, National University of Singapore, Singapore

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

Set-valued data is a natural and concise representation for modeling complex objects. As an important operation of object-oriented or object-relational database, set containment query processing over set-valued data has been extensively studied in previous works. Recently, there is a growing realization that uncertain information is a first-class citizen in modern database management. As such, there is a strong demand for study of set containment queries over uncertain set-valued data. This paper investigates how set-containment queries over uncertain set-valued data can be efficiently processed. Based on the popular possible world semantics, we first present a practical model in which the uncertainty in set-valued data is represented by existential probabilities, and propose the probabilistic set containment semantics and its generalization - the expected Jaccard containment. Second, to avoid expensive computations in enumerating all possible worlds, we develop efficient schemes for computing these two probabilistic semantics. Third, we introduce two important queries, namely probability threshold containment query (PTCQ) and probability threshold containment join (PTCJ), and propose novel techniques to process them efficiently. Finally, we conduct extensive experiments to study the efficiency of the proposed methods. The experimental results indicate that the proposed methods are efficient in processing the uncertain set containment queries.