Clustering Large Attributed Graphs: An Efficient Incremental Approach

Authors:
Yang Zhou;Hong Cheng;Jeffrey Xu Yu
Affiliations:
-;-;-
Venue:
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Year:
2010

Citing 0
Cited 7

DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
A model-based approach to attributed graph clustering

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding collections of k-clique percolated components in attributed graphs

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Combining Relations and Text in Scientific Network Clustering

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Social influence based clustering of heterogeneous information networks

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient community detection in large networks using content and links

Proceedings of the 22nd international conference on World Wide Web
Identification of collective viewpoints on microblogs

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, many networks have become available for analysis, including social networks, sensor networks, biological networks, etc. Graph clustering has shown its effectiveness in analyzing and visualizing large networks. The goal of graph clustering is to partition vertices in a large graph into clusters based on various criteria such as vertex connectivity or neighborhood similarity. Many existing graph clustering methods mainly focus on the topological structures, but largely ignore the vertex properties which are often heterogeneous. Recently, a new graph clustering algorithm, SA-Cluster, has been proposed which combines structural and attribute similarities through a unified distance measure. SA-Cluster performs matrix multiplication to calculate the random walk distances between graph vertices. As the edge weights are iteratively adjusted to balance the importance between structural and attribute similarities, matrix multiplication is repeated in each iteration of the clustering process to recalculate the random walk distances which are affected by the edge weight update. In order to improve the efficiency and scalability of SA-Cluster, in this paper, we propose an efficient algorithm Inc-Cluster to incrementally update the random walk distances given the edge weight increments. Complexity analysis is provided to estimate how much runtime cost Inc-Cluster can save. Experimental results demonstrate that Inc-Cluster achieves significant speedup over SA-Cluster on large graphs, while achieving exactly the same clustering quality in terms of intra-cluster structural cohesiveness and attribute value homogeneity.