A clustering based approach for skyline diversity

Authors:
Zhenhua Huang;Yang Xiang;Bo Zhang;Xiaoling Liu
Affiliations:
Department of Computer Science, Tongji University, Shanghai 200092, China and The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai 200092 ...;Department of Computer Science, Tongji University, Shanghai 200092, China;Department of Computer Science, Tongji University, Shanghai 200092, China;School Computer Science, Fudan University, Shanghai 200433, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 27
Cited 1

An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Efficient Progressive Skyline Computation

Proceedings of the 27th International Conference on Very Large Data Bases
SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Progressive skyline computation in database systems

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Efficient computation of the skyline cube

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Refreshing the sky: the compressed skycube with efficient support for frequent updates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Generalized K-Means Algorithm with Semi-Supervised Weight Coefficients

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Towards multidimensional subspace skyline analysis

ACM Transactions on Database Systems (TODS)
A performance comparison of distance-based query algorithms using R-trees in spatial databases

Information Sciences: an International Journal
Semantic optimization techniques for preference queries

Information Systems
On Efficient Processing of Subspace Skyline Queries on High Dimensional Data

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Efficient Skyline and Top-k Retrieval in Subspaces

IEEE Transactions on Knowledge and Data Engineering
Shooting stars in the sky: an online algorithm for skyline queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A hybridized approach to data clustering

Expert Systems with Applications: An International Journal
A new mechanism for resource monitoring in Grid computing

Future Generation Computer Systems
Cache-Oblivious R-Trees

Algorithmica
Semi-supervised graph clustering: a kernel approach

Machine Learning
Improved smoothed analysis of the k-means method

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Efficient skyline retrieval with arbitrary similarity measures

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
NP-hardness of Euclidean sum-of-squares clustering

Machine Learning
Workload-Driven Compressed Skycube Queries in Wireless Applications

WASA '09 Proceedings of the 4th International Conference on Wireless Algorithms, Systems, and Applications
A Grid Index Based Method for Continuous Constrained Skyline Query over Data Stream

Advances in Web and Network Technologies, and Information Management
Meta Galaxy: A Flexible and Efficient Cube Model for Data Retrieval in OLAP

Advances in Web and Network Technologies, and Information Management
Preference-Based Recommendations for OLAP Analysis

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Evaluation of skyline algorithms in PostgreSQL

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium

On efficient reverse skyline query processing

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Skyline query processing has recently received a lot of attention in database and data-mining communities. To the best of our knowledge, the existing researches mainly focus on considering how to efficiently return the whole skyline set. However, when the cardinality and dimensionality of input objects increase, the number of skylines grows exponentially, and hence this ''huge'' skyline set is completely useless to users. On the other hand, in most real applications, the objects are usually clustered, and therefore many objects have similar attribute values. Motivated by the above facts, in this paper, we present a novel type of SkyCluster query to capture the skyline diversity and improve the usefulness of skyline result. The SkyCluster query integrates K-means clustering into skyline computation, and returns K ''representative'' and ''diverse'' skyline objects to users. To process such query, a straightforward approach is to simply integrate the existing techniques developed for skyline-only and clustering-only together. But this approach is costly since both skyline computation and K-means clustering are all CPU-sensitive. We propose an efficient evaluation approach which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection. Also, we present a novel optimization heuristic to further improve the query performance. Experimental study shows that our approach is both efficient and effective.