Parallel structural graph clustering

Authors:
Madeleine Seeland;Simon A. Berger;Alexandros Stamatakis;Stefan Kramer
Affiliations:
Technische Universität München, Institut für Informatik, München, Germany;Heidelberg Institute for Theoretical Studies, Heidelberg, Germany;Heidelberg Institute for Theoretical Studies, Heidelberg, Germany;Technische Universität München, Institut für Informatik, München, Germany
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Year:
2011

Citing 8
Cited 2

gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Clustering graphs by weighted substructure mining

ICML '06 Proceedings of the 23rd international conference on Machine learning
ChemDB: a public database of small molecules and related chemoinformatics resources

Bioinformatics
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
GDClust: A Graph-Based Document Clustering Technique

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
ChemDB update—full-text search and virtual chemical space

Bioinformatics
Online structural graph clustering using frequent subgraph mining

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Graph clustering based on structural similarity of fragments

Proceedings of the 2005 international conference on Federation over the Web

Maximum Common Subgraph based locally weighted regression

Proceedings of the 27th Annual ACM Symposium on Applied Computing
A structural cluster kernel for learning on graphs

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of clustering large graph databases according to scaffolds (i.e., large structural overlaps) that are shared between cluster members. In previous work, an online algorithm was proposed for this task that produces overlapping (non-disjoint) and nonexhaustive clusterings. In this paper, we parallelize this algorithm to take advantage of high-performance parallel hardware and further improve the algorithm in three ways: a refined cluster membership test based on a set abstraction of graphs, sorting graphs according to size, to avoid cluster membership tests in the first place, and the definition of a cluster representative once the cluster scaffold is unique, to avoid cluster comparisons with all cluster members. In experiments on a large database of chemical structures, we show that running times can be reduced by a large factor for one parameter setting used in previous work. For harder parameter settings, it was possible to obtain results within reasonable time for 300,000 structures, compared to 10,000 structures in previous work. This shows that structural, scaffold-based clustering of smaller libraries for virtual screening is already feasible.