Spectral Embedded Clustering: A Framework for In-Sample and Out-of-Sample Spectral Clustering

  • Authors:
  • Feiping Nie;Zinan Zeng;Ivor W. Tsang;Dong Xu;Changshui Zhang

  • Affiliations:
  • University of Texas, Arlington, TX, USA;School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China

  • Venue:
  • IEEE Transactions on Neural Networks
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than $K$ -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and $K$-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nyström algorithm on unseen data.