Enhancing K-Means using class labels

  • Authors:
  • Billy Peralta;Pablo Espinace;Alvaro Soto

  • Affiliations:
  • Pontificia Universidad Católica de Chile, Región Metropolitana, Chile;Pontificia Universidad Católica de Chile, Región Metropolitana, Chile;Pontificia Universidad Católica de Chile, Región Metropolitana, Chile

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class-uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means LK-Means, an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: i A discriminative score based on class labels, and ii A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.