Regression Clustering

  • Authors:
  • Bin Zhang

  • Affiliations:
  • -

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Complex distribution in real-world data is oftenmodeled by a mixture of simpler distributions. Clusteringis one of the tools to reveal the structure of this mixture.The same is true to the datasets with chosen responsevariables that people run regression on. Withoutseparating the clusters with very different responseproperties, the residue error of the regression is large.Input variable selection could also be misguided to ahigher complexity by the mixture. In RegressionClustering (RC), K (1) regression functions are appliedto the dataset simultaneously which guide the clusteringof the dataset into K subsets each with a simplerdistribution matching its guiding function. Each functionis regressed on its own subset of data with a muchsmaller residue error. Both the regressions and theclustering optimize a common objective function. Wepresent a RC algorithm based on K-Harmonic Meansclustering algorithm and compare it with other existingRC algorithms based on K-Means and EM.