Mining representative subspace clusters in high-dimensional data

  • Authors:
  • Guanhua Chen;Xiuli Ma;Dongqing Yang;Shiwei Tang;Meng Shuai

  • Affiliations:
  • School of EECS, Peking University, Beijing, China;School of EECS, Peking University, Beijing, China and Key Laboratory of Machine Perception, Peking University, Ministry of Education, China;School of EECS, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies, Peking University, Ministry of Education, China;School of EECS, Peking University, Beijing, China and Key Laboratory of Machine Perception, Peking University, Ministry of Education, China;School of EECS, Peking University, Beijing, China and Key Laboratory of Machine Perception, Peking University, Ministry of Education, China

  • Venue:
  • FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data's dimensionality. In this paper, we propose to mine the representative subspace clusters in high-dimensional data to alleviate the problem. Typically, subspace clusters can be clustered further into groups, and several representative clusters can be generated from each group. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present an approximate method PCoC. The greatest advantage of our method is that we only need a subset of subspace clusters as the input. Our performance study shows the effectiveness and efficiency of the method.