A clustering based approach for skyline diversity

  • Authors:
  • Zhenhua Huang;Yang Xiang;Bo Zhang;Xiaoling Liu

  • Affiliations:
  • Department of Computer Science, Tongji University, Shanghai 200092, China and The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092 ...;Department of Computer Science, Tongji University, Shanghai 200092, China;Department of Computer Science, Tongji University, Shanghai 200092, China;School Computer Science, Fudan University, Shanghai 200433, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Skyline query processing has recently received a lot of attention in database and data-mining communities. To the best of our knowledge, the existing researches mainly focus on considering how to efficiently return the whole skyline set. However, when the cardinality and dimensionality of input objects increase, the number of skylines grows exponentially, and hence this ''huge'' skyline set is completely useless to users. On the other hand, in most real applications, the objects are usually clustered, and therefore many objects have similar attribute values. Motivated by the above facts, in this paper, we present a novel type of SkyCluster query to capture the skyline diversity and improve the usefulness of skyline result. The SkyCluster query integrates K-means clustering into skyline computation, and returns K ''representative'' and ''diverse'' skyline objects to users. To process such query, a straightforward approach is to simply integrate the existing techniques developed for skyline-only and clustering-only together. But this approach is costly since both skyline computation and K-means clustering are all CPU-sensitive. We propose an efficient evaluation approach which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection. Also, we present a novel optimization heuristic to further improve the query performance. Experimental study shows that our approach is both efficient and effective.