Continuous monitoring of skylines over uncertain data streams

  • Authors:
  • Xiaofeng Ding;Xiang Lian;Lei Chen;Hai Jin

  • Affiliations:
  • Services Computing Tech. & Sys. Lab, Cluster and Grid Computing Lab, School of Computer Science, Huazhong University of Sci. & Tech., 1037 Luoyu Road, Wuhan, Hubei, China;Department of Computer Science, HongKong University of Sci. & Tech., Clear Water Bay, Kowloon, Hong Kong, China;Department of Computer Science, HongKong University of Sci. & Tech., Clear Water Bay, Kowloon, Hong Kong, China;Services Computing Tech. & Sys. Lab, Cluster and Grid Computing Lab, School of Computer Science, Huazhong University of Sci. & Tech., 1037 Luoyu Road, Wuhan, Hubei, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

Uncertain data are inevitable in many applications due to various factors such as the limitations of measuring equipment and delays in data updates. Although modeling and querying uncertain data have recently attracted considerable attention from the database community, there are still many critical issues to be resolved with respect to conducting advanced analysis on uncertain data. In this paper, we study the execution of the probabilistic skyline query over uncertain data streams. We propose a novel sliding window skyline model where an uncertain tuple may take the probability to be in the skyline at a certain timestamp t. Formally, a Wp-Skyline(p,t) contains all the tuples whose probabilities of becoming skylines are at least p at timestamp t. However, in the stream environment, computing a probabilistic skyline on a large number of uncertain tuples within the sliding window is a daunting task in practice. In order to efficiently calculate Wp-Skyline, we propose an efficient and effective approach, namely the candidate list approach, which maintains lists of candidates that might become skylines in future sliding windows. We also propose algorithms that continuously monitor the newly incoming and expired data to maintain the skyline candidate set incrementally. To further reduce the computation cost of deciding whether or not a candidate tuple belongs to the skyline, we propose an enhanced refinement strategy that is based on a multi-dimensional indexing structure combined with a grouping-and-conquer strategy. To validate the effectiveness of our proposed approach, we conduct extensive experiments on both real and synthetic data sets and make comparisons with basic techniques.