Clustering with internal connectedness

  • Authors:
  • Neelima Gupta;Aditya Pancholi;Yogish Sabharwal

  • Affiliations:
  • Department of Computer Science, Delhi University;Department of Computer Science, Delhi University;IBM Research - India, New Delhi

  • Venue:
  • WALCOM'11 Proceedings of the 5th international conference on WALCOM: algorithms and computation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we study the problem of clustering entities that are described by two types of data: attribute data and relationship data. While attribute data describe the inherent characteristics of the entities, relationship data represent associations among them. Attribute data can be mapped to the Euclidean space, whereas that is not always possible for the relationship data. The relationship data is described by a graph over the vertices with edges denoting relationship between pairs of entities that they connect. We study clustering problems under the model where the relationship data is constrained by 'internal connectedness,' which requires that any two entities in a cluster are connected by an internal path, that is, a path via entities only from the same cluster. We study the k-median and k-means clustering problems under this model. We show that these problems are Ω(log n) hard to approximate and give O(log n) approximation algorithms for specific cases of these problems.