Clustering with internal connectedness

Authors:
Neelima Gupta;Aditya Pancholi;Yogish Sabharwal
Affiliations:
Department of Computer Science, Delhi University;Department of Computer Science, Delhi University;IBM Research - India, New Delhi
Venue:
WALCOM'11 Proceedings of the 5th international conference on WALCOM: algorithms and computation
Year:
2011

Citing 13
Cited 0

Color indexing

International Journal of Computer Vision
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
A General Approximation Technique for Constrained Forest Problems

SIAM Journal on Computing
Approximation algorithms for geometric problems

Approximation algorithms for NP-hard problems
A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Introduction to Algorithms

Introduction to Algorithms
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Joint cluster analysis of attribute data and relationship data: The connected k-center problem, algorithms and applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
Algorithms for connected set cover problem and fault-tolerant connected set cover problem

Theoretical Computer Science
Linear-time approximation schemes for clustering problems in any dimensions

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study the problem of clustering entities that are described by two types of data: attribute data and relationship data. While attribute data describe the inherent characteristics of the entities, relationship data represent associations among them. Attribute data can be mapped to the Euclidean space, whereas that is not always possible for the relationship data. The relationship data is described by a graph over the vertices with edges denoting relationship between pairs of entities that they connect. We study clustering problems under the model where the relationship data is constrained by 'internal connectedness,' which requires that any two entities in a cluster are connected by an internal path, that is, a path via entities only from the same cluster. We study the k-median and k-means clustering problems under this model. We show that these problems are Ω(log n) hard to approximate and give O(log n) approximation algorithms for specific cases of these problems.