Learning mid-perpendicular hyperplane similarity from cannot-link constraints

  • Authors:
  • Shan Gao;Chen Zu;Daoqiang Zhang

  • Affiliations:
  • Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

Pairwise constraints known as must-link and cannot-link constraints have been frequently used in semi-supervised clustering. In this paper, we propose a novel usage of cannot-link constraints and develop a method called Mid-Perpendicular Hyperplane Similarity (MPHS) for semi-supervised clustering. Since a cannot-link constraint means that the two objects linked by it are not in the same class, there is a mid-perpendicular hyperplane to distinguish them. For each cannot-link constraint, we first compute the corresponding mid-perpendicular hyperplane and then use distances of objects to this hyperplane to learn a new data representation and similarity matrix. Finally, we combine all the similarity matrices from all cannot-link constraints into single similarity matrix and perform kernel k-means on it to obtain the partition. We implement MPHS for two cases, i.e., a simple one performed in original input space when the data set is nearly linear-separable, and an advanced one in kernel-induced feature space when the data set is complex and nonlinear-separable. Experimental results on several UCI data sets and some image data sets show the effectiveness of our method.