Semi-supervised Clustering Using Bayesian Regularization

  • Authors:
  • Zuobing Xu;Ram Akella;Mike Ching;Renjie Tang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text clustering is most commonly treated as a fully au- tomated task without user supervision. However, we can improve clustering performance using supervision in the form of pairwise (must-link and cannot-link) constraints. This paper introduces a rigorous Bayesian framework for semi-supervised clustering which incorporates human su- pervision in the form of pairwise constraints both in the expectation step and maximization step of the EM algo- rithm. During the expectation step, we model the pair- wise constraints as random variables, which enable us to capture the uncertainty in constraints in a principled man- ner. During the maximization step, we treat the constraint documents as prior information, and adjust the probability mass of model distribution to emphasize words occurring in constraint documents by using Bayesian regularization. Bayesian conjugate prior modeling makes the maximization step more efficient than gradient search methods in the tra- ditional distance learning. Experimental results on several text datasets demonstrate significant advantages over exist- ing algorithms.