Subspace clustering of high-dimensional data: a predictive approach

Authors:
Brian Mcwilliams;Giovanni Montana
Affiliations:
Department of Informatics, ETH, Zürich, Switzerland;Department of Mathematics, Imperial College London, London, UK
Venue:
Data Mining and Knowledge Discovery
Year:
2014

Citing 17
Cited 0

Asymptotic convergence analysis of the projection approximation subspace tracking algorithms

Signal Processing - Special issue on subspace methods, part I: array signal processing and subspace computations
Mixtures of probabilistic principal component analyzers

Neural Computation
k-Plane Clustering

Journal of Global Optimization
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Locally adaptive metrics for clustering high dimensional data

Data Mining and Knowledge Discovery
A tutorial on spectral clustering

Statistics and Computing
Sparse principal component analysis via regularized low rank matrix approximation

Journal of Multivariate Analysis
Spectral Curvature Clustering (SCC)

International Journal of Computer Vision
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
K-Subspace Clustering

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso)

IEEE Transactions on Information Theory
Improving the robustness to outliers of mixtures of probabilistic PCAs

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mixtures of common t-factor analyzers for clustering high-dimensional microarray data

Bioinformatics
Predictive Subspace Clustering

ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis (PCA). The proposed algorithm, Predictive Subspace Clustering (PSC) partitions the data into clusters while simultaneously estimating cluster-wise PCA parameters. The algorithm minimises an objective function that depends upon a new measure of influence for PCA models. A penalised version of the algorithm is also described for carrying our simultaneous subspace clustering and variable selection. The convergence of PSC is discussed in detail, and extensive simulation results and comparisons to competing methods are presented. The comparative performance of PSC has been assessed on six real gene expression data sets for which PSC often provides state-of-art results.