A Topological View of Unsupervised Learning from Noisy Data

Authors:
P. Niyogi;S. Smale;S. Weinberger
Affiliations:
-;smale@math.berkeley.edu;shmuel@math.uchicago.edu
Venue:
SIAM Journal on Computing
Year:
2011

Citing 13
Cited 1

Three-dimensional alpha shapes

ACM Transactions on Graphics (TOG)
A simple algorithm for homeomorphic surface reconstruction

Proceedings of the sixteenth annual symposium on Computational geometry
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Computing Persistent Homology

Discrete & Computational Geometry
Manifold reconstruction from point samples

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Stability of Persistence Diagrams

Discrete & Computational Geometry
Finding the Homology of Submanifolds with High Confidence from Random Samples

Discrete & Computational Geometry
Smooth manifold reconstruction from noisy and non-uniform approximation with guarantees

Computational Geometry: Theory and Applications
Towards a theoretical foundation for Laplacian-based manifold methods

Journal of Computer and System Sciences
Spectral Clustering, Ordering and Ranking: Statistical Learning with Matrix Factorizations

Spectral Clustering, Ordering and Ranking: Statistical Learning with Matrix Factorizations
Manifold Reconstruction in Arbitrary Dimensions Using Witness Complexes

Discrete & Computational Geometry - 23rd Annual Symposium on Computational Geometry
From graphs to manifolds – weak and strong pointwise consistency of graph laplacians

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Order-k α-hulls and α-shapes

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we take a topological view of unsupervised learning. From this point of view, clustering may be interpreted as trying to find the number of connected components of any underlying geometrically structured probability distribution in a certain sense that we will make precise. We construct a geometrically structured probability distribution that seems appropriate for modeling data in very high dimensions. A special case of our construction is the mixture of Gaussians where there is Gaussian noise concentrated around a finite set of points (the means). More generally we consider Gaussian noise concentrated around a low dimensional manifold and discuss how to recover the homology of this underlying geometric core from data that do not lie on it. We show that if the variance of the Gaussian noise is small in a certain sense, then the homology can be learned with high confidence by an algorithm that has a weak (linear) dependence on the ambient dimension. Our algorithm has a natural interpretation as a spectral learning algorithm using a combinatorial Laplacian of a suitable data-derived simplicial complex.