Finding density-based subspace clusters in graphs with feature vectors

Authors:
Stephan Günnemann;Brigitte Boden;Thomas Seidl
Affiliations:
Data Management and Data Exploration Group, RWTH Aachen University, Aachen, Germany;Data Management and Data Exploration Group, RWTH Aachen University, Aachen, Germany;Data Management and Data Exploration Group, RWTH Aachen University, Aachen, Germany
Venue:
Data Mining and Knowledge Discovery
Year:
2012

Citing 0
Cited 3

Efficiently computing k-edge connected components via graph decomposition

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Finding contexts of social influence in online social networks

Proceedings of the 7th Workshop on Social Network Mining and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data sources representing attribute information in combination with network information are widely available in today's applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes. In this work we introduce a density-based cluster definition, which takes into account the attribute similarity in subspaces as well as a local graph density and enables us to detect clusters of arbitrary shape and size. Furthermore, we avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC, which uses a fixed point iteration method to efficiently determine the clustering solution. We prove the correctness and complexity of this fixed point iteration analytically. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.