Model-based Clustering with Soft Balancing

  • Authors:
  • Shi Zhong;Joydeep Ghosh

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Balanced clustering algorithms can be useful in a varietyof applications and have recently attracted increasing researchinterest. Most recent work, however, addressed onlyhard balancing by constraining each cluster to have equalor a certain minimum number of data objects. This paperprovides a soft balancing strategy built upon a soft mixture-of-models clustering framework. This strategy constrains the sum of posterior probabilities of object membership foreach cluster to be equal and thus balances the expectednumber of data objects in each cluster. We first derive softmodel-based clustering from an information-theoretic viewpointand then show that the proposed balanced clusteringcan be parameterized by a temperature parameter that controlsthe softness of clustering as well as that of balancing.As the temperature decreases, the resulting partitioning becomesmore and more balanced. In the limit, when temperaturebecomes zero, the balancing becomes hard and theactual partitioning becomes perfectly balanced. The effectivenessof the proposed soft balanced clustering algorithmis demonstrated on both synthetic and real text data.