Spectral methods for multi-scale feature extraction and data clustering

  • Authors:
  • Allan D. Jepson;Srinivas Chakra Chennubhotla

  • Affiliations:
  • -;-

  • Venue:
  • Spectral methods for multi-scale feature extraction and data clustering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address two issues that are fundamental to the analysis of naturally-occurring datasets: how to extract features that arise at multiple-scales and how to cluster items in a dataset using pairwise similarities between the elements. To this end we present two spectral methods: (1) Sparse Principal Component Analysis S-PCA—a framework for learning a linear, orthonormal basis representation for structure intrinsic to a given dataset; and (2) EigenCuts—an algorithm for clustering items in a dataset using their pairwise-similarities. S-PCA is based on the discovery that natural images exhibit structure in a low-dimensional subspace in a local, scale-dependent form. It is motivated by the observation that PCA does not typically recover such representations, due to its single minded pursuit of variance. In fact, it is widely believed that the analysis of second-order statistics alone is insufficient for extracting multi-scale structure from data and there are many proposals in the literature showing how to harness higher-order image statistics to build multi-scale representations. In this thesis, we show that resolving second-order statistics with suitably constrained basis directions is indeed sufficient to extract multi-scale structure. In particular, S-PCA basis optimizes an objective function which trades off correlations among output coefficients for sparsity in the description of basis vector elements. Using S-PCA we present new approaches to the problem of constrast-invariant appearance detection, specifically eye and face detection. EigenCuts is a clustering algorithm for finding stable clusters in a dataset. Using a Markov chain perspective, we derive an eigenflow to describe the flow of probability mass due to the Markov chain and characterize it by its eigenvalue, or equivalently, by the halflife of its decay as the Markov chain is iterated. The key insight in this work is that bottlenecks between weakly coupled clusters can be identified by computing the sensitivity of the eigenflow's halflife to variations in the edge weights. The EigenCuts algorithm performs clustering by removing these identified bottlenecks in an iterative fashion. As an efficient step in this process we also propose a specialized hierarchical eigensolver suitable for large stochastic matrices.