Translation-invariant mixture models for curve clustering

  • Authors:
  • Darya Chudova;Scott Gaffney;Eric Mjolsness;Padhraic Smyth

  • Affiliations:
  • University of California, Irvine, CA;University of California, Irvine, CA;University of California, Irvine, CA;University of California, Irvine, CA

  • Venue:
  • Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach uses the Expectation-Maximization (EM) algorithm to recover both the mean curve shapes for each cluster, and the most likely shifts, offsets, and cluster memberships for each curve. We demonstrate how Bayesian estimation methods can improve the results for small sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, time-course gene expression data and storm trajectory data. Experimental results show that models that incorporate curve alignment systematically provide improvements in predictive power and within-cluster variance on test data sets. The proposed approach provides a non-parametric, computationally efficient, and robust methodology for clustering broad classes of curve data.