Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximating a collection of frequent sets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace Selection for Clustering High-Dimensional Data
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Density Connected Clustering with Local Subspace Preferences
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Summarizing itemset patterns: a profile-based approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Summarizing itemset patterns using probabilistic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective and efficient itemset pattern summarization: regression-based approaches
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
DUSC: Dimensionality Unbiased Subspace Clustering
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Discovering the Skyline of Subspace Clusters in High-Dimensional Data
FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
Hi-index | 0.00 |
A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data's dimensionality. In this paper, we propose to mine the representative subspace clusters in high-dimensional data to alleviate the problem. Typically, subspace clusters can be clustered further into groups, and several representative clusters can be generated from each group. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present an approximate method PCoC. The greatest advantage of our method is that we only need a subset of subspace clusters as the input. Our performance study shows the effectiveness and efficiency of the method.