Understanding the meaning of a shifted sky: a general framework on extending skyline query

  • Authors:
  • Zhenjie Zhang;Hua Lu;Beng Chin Ooi;Anthony K. Tung

  • Affiliations:
  • Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore;Department of Computer Science, Faculties of Engineering, Science and Medicine, Aalborg University, Aalborg, Denmark;Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore;Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Skyline queries are often used on data sets in multi-dimensional space for many decision-making applications. Traditionally, an object p is said to dominate another object q if, for all dimensions, it is no worse than q and is better on at least one dimension. Therefore, the skyline of a data set consists of all objects not dominated by any other object. To better cater to application requirements such as controlling the size of the skyline or handling data sets that are not well-structured, various works have been proposed to extend the definition of skyline based on variants of the dominance relationship. In view of the proliferation of variants, in this paper, a generalized framework is proposed to guide the extension of skyline query from conventional definition to different variants. Our framework explicitly and carefully examines the various properties that should be preserved in a variant of the dominance relationship so that: (1) maintaining original advantages, while extending adaptivity to application semantics, and (2) keeping computational complexity almost unaffected. We prove that traditional dominance is the only relationship satisfying all desirable properties, and present some new dominance relationships by relaxing some of the properties. These relationships are general enough for us to design new top-k skyline queries that return robust results of a controllable size. We analyze the existing skyline algorithms based on their minimum requirements on dominance properties. We also extend our analysis to data sets with missing values, and present extensive experimental results on the combinations of new dominance relationships and skyline algorithms.