Influence of erroneous pairwise constraints in semi-supervised clustering

  • Authors:
  • Tetsuya Yoshida

  • Affiliations:
  • Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan

  • Venue:
  • AMT'12 Proceedings of the 8th international conference on Active Media Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Side information such as pairwise constraints is useful to improve the clustering performance in general. However, constraints are not always error free in general. When erroneous constraints are specified as side information, treating them as hard constraints could have the disadvantage since strengthening incorrect or erroneous constraints can lead to performance degradation. In this paper we conduct extensive experiments to investigate the influence of erroneous pairwise constraints over various document datasets. Several state-of-the-art semi-supervised clustering methods with graph representation were evaluated with respect to the type of constraints as well as the number of constraints. Experimental results confirmed that treating pairwise constraints as hard constraints is vulnerable to erroneous ones. However, the results also revealed that the influence of erroneous constraints depends on how the constraints are exploited inside a learning algorithm.