On high dimensional skylines

  • Authors:
  • Chee-Yong Chan;H. V. Jagadish;Kian-Lee Tan;Anthony K. H. Tung;Zhenjie Zhang

  • Affiliations:
  • National University of Singapore & University of Michigan;National University of Singapore & University of Michigan;National University of Singapore & University of Michigan;National University of Singapore & University of Michigan;National University of Singapore & University of Michigan

  • Venue:
  • EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multi-dimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.