Pairwise data clustering and applications

  • Authors:
  • Xiaodong Wu;Danny Z. Chen;James J. Mason;Steven R. Schmid

  • Affiliations:
  • Department of Computer Science, University of Texas-Pan American, Edinburg, TX;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN;Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN;Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN

  • Venue:
  • COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data clustering is an important theoretical topic and a sharp tool for various applications. Its main objective is to partition a given data set into clusters such that the data within the same cluster are "more" similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/ dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. We present a ((4 + o(1)) ln n)-approximation polynomial time algorithm for the minimum normalized cut problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2 + o(1)) ln n)- approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor.