Subspace metric ensembles for semi-supervised clustering of high dimensional data

  • Authors:
  • Bojun Yan;Carlotta Domeniconi

  • Affiliations:
  • Department of Information and Software Engineering, George Mason University, Fairfax, Virginia;Department of Information and Software Engineering, George Mason University, Fairfax, Virginia

  • Venue:
  • ECML'06 Proceedings of the 17th European conference on Machine Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A critical problem in clustering research is the definition of a proper metric to measure distances between points. Semi-supervised clustering uses the information provided by the user, usually defined in terms of constraints, to guide the search of clusters. Learning effective metrics using constraints in high dimensional spaces remains an open challenge. This is because the number of parameters to be estimated is quadratic in the number of dimensions, and we seldom have enough side-information to achieve accurate estimates. In this paper, we address the high dimensionality problem by learning an ensemble of subspace metrics. This is achieved by projecting the data and the constraints in multiple subspaces, and by learning positive semi-definite similarity matrices therein. This methodology allows leveraging the given side-information while solving lower dimensional problems. We demonstrate experimentally using high dimensional data (e.g., microarray data) the superior accuracy achieved by our method with respect to competitive approaches.