An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Proceedings of the 17th International Conference on Data Engineering
Efficient Progressive Skyline Computation
Proceedings of the 27th International Conference on Very Large Data Bases
SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Progressive skyline computation in database systems
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Automatic Subspace Clustering of High Dimensional Data
Data Mining and Knowledge Discovery
Efficient computation of the skyline cube
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Refreshing the sky: the compressed skycube with efficient support for frequent updates
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Generalized K-Means Algorithm with Semi-Supervised Weight Coefficients
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Towards multidimensional subspace skyline analysis
ACM Transactions on Database Systems (TODS)
A performance comparison of distance-based query algorithms using R-trees in spatial databases
Information Sciences: an International Journal
Semantic optimization techniques for preference queries
Information Systems
On Efficient Processing of Subspace Skyline Queries on High Dimensional Data
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Efficient Skyline and Top-k Retrieval in Subspaces
IEEE Transactions on Knowledge and Data Engineering
Shooting stars in the sky: an online algorithm for skyline queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A hybridized approach to data clustering
Expert Systems with Applications: An International Journal
A new mechanism for resource monitoring in Grid computing
Future Generation Computer Systems
Algorithmica
Semi-supervised graph clustering: a kernel approach
Machine Learning
Improved smoothed analysis of the k-means method
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Efficient skyline retrieval with arbitrary similarity measures
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
Workload-Driven Compressed Skycube Queries in Wireless Applications
WASA '09 Proceedings of the 4th International Conference on Wireless Algorithms, Systems, and Applications
A Grid Index Based Method for Continuous Constrained Skyline Query over Data Stream
Advances in Web and Network Technologies, and Information Management
Meta Galaxy: A Flexible and Efficient Cube Model for Data Retrieval in OLAP
Advances in Web and Network Technologies, and Information Management
Preference-Based Recommendations for OLAP Analysis
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Evaluation of skyline algorithms in PostgreSQL
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
On efficient reverse skyline query processing
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Skyline query processing has recently received a lot of attention in database and data-mining communities. To the best of our knowledge, the existing researches mainly focus on considering how to efficiently return the whole skyline set. However, when the cardinality and dimensionality of input objects increase, the number of skylines grows exponentially, and hence this ''huge'' skyline set is completely useless to users. On the other hand, in most real applications, the objects are usually clustered, and therefore many objects have similar attribute values. Motivated by the above facts, in this paper, we present a novel type of SkyCluster query to capture the skyline diversity and improve the usefulness of skyline result. The SkyCluster query integrates K-means clustering into skyline computation, and returns K ''representative'' and ''diverse'' skyline objects to users. To process such query, a straightforward approach is to simply integrate the existing techniques developed for skyline-only and clustering-only together. But this approach is costly since both skyline computation and K-means clustering are all CPU-sensitive. We propose an efficient evaluation approach which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection. Also, we present a novel optimization heuristic to further improve the query performance. Experimental study shows that our approach is both efficient and effective.