Joint cluster analysis of attribute data and relationship data: The connected k-center problem, algorithms and applications

Authors:
Rong Ge;Martin Ester;Byron J. Gao;Zengjian Hu;Binay Bhattacharya;Boaz Ben-Moshe
Affiliations:
Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Ariel University Center, Ariel, Israel
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2008

Citing 21
Cited 10

Algorithms for clustering data

Algorithms for clustering data
Optimal algorithms for approximate clustering

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Approximation algorithms for geometric median problems

Information Processing Letters
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximation algorithms for min-sum p-clustering

Discrete Applied Mathematics
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A clustering algorithm based on graph connectivity

Information Processing Letters
Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation

Journal of the ACM (JACM)
Approximating min-sum k-clustering in metric spaces

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
A constant-factor approximation algorithm for the k-median problem

Journal of Computer and System Sciences - STOC 1999
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering to minimize the sum of cluster diameters

Journal of Computer and System Sciences - STOC 2001
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Primal–Dual Algorithms for Connected Facility Location Problems

Algorithmica
Semi-supervised graph clustering: a kernel approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic classification and clustering in relational data

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
An O(pn2) algorithm for the p -median and related problems on tree graphs

Operations Research Letters

Agglomerative genetic algorithm for clustering in social networks

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Automatic Choice of Control Measurements

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
On community outliers and their efficient detection in information networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering with internal connectedness

WALCOM'11 Proceedings of the 5th international conference on WALCOM: algorithms and computation
Pattern change discovery between high dimensional data sets

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining attribute-structure correlated patterns in large attributed graphs

Proceedings of the VLDB Endowment
Community detection in incomplete information networks

Proceedings of the 21st international conference on World Wide Web
Finding collections of k-clique percolated components in attributed graphs

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Cascade-based community detection

Proceedings of the sixth ACM international conference on Web search and data mining
Combining Relations and Text in Scientific Network Clustering

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attribute data and relationship data are two principal types of data, representing the intrinsic and extrinsic properties of entities. While attribute data have been the main source of data for cluster analysis, relationship data such as social networks or metabolic networks are becoming increasingly available. It is also common to observe both data types carry complementary information such as in market segmentation and community identification, which calls for a joint cluster analysis of both data types so as to achieve better results. In this article, we introduce the novel Connected k-Center (CkC) problem, a clustering model taking into account attribute data as well as relationship data. We analyze the complexity of the problem and prove its NP-hardness. Therefore, we analyze the approximability of the problem and also present a constant factor approximation algorithm. For the special case of the CkC problem where the relationship data form a tree structure, we propose a dynamic programming method giving an optimal solution in polynomial time. We further present NetScan, a heuristic algorithm that is efficient and effective for large real databases. Our extensive experimental evaluation on real datasets demonstrates the meaningfulness and accuracy of the NetScan results.