Semi-supervised Document Clustering via Active Learning with Pairwise Constraints

  • Authors:
  • Ruizhang Huang;Wai Lam

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates a framework that discovers pairwise constraints for semi-supervised text document clustering. An active learning approach is proposed to select informative document pairs for obtaining user feedbacks. A gain directed document pair selection method that measures how much we can learn by revealing the relationships between pairs of documents is designed. Three different models, namely, uncertainty model, generation error model, and objective function model are proposed. Language modeling is investigated for representing clusters in the semi-supervised document clustering approach.