Mining Large Networks with Subgraph Counting

Authors:
Ilaria Bordino;Debora Donato;Aristides Gionis;Stefano Leonardi
Affiliations:
-;-;-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 8

Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Efficient algorithms for large-scale local triangle counting

ACM Transactions on Knowledge Discovery from Data (TKDD)
Approximate counting of cycles in streams

ESA'11 Proceedings of the 19th European conference on Algorithms
Multi-agent adaptive boosting on semi-supervised water supply clusters

Advances in Engineering Software
Streaming and communication complexity of clique approximation

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Counting arbitrary subgraphs in data streams

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part II
On the streaming complexity of computing local clustering coefficients

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of mining frequent patterns in networks has many applications, including analysis of complex networks, clustering of graphs, finding communities in social networks, and indexing of graphical and biological databases. Despite this wealth of applications, the current state of the art lacks algorithmic tools for counting the number of subgraphs contained in a large network. In this paper we develop data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks. We use the frequency of occurrence of all subgraphs to prove their significance in order to characterize different kinds of networks: we achieve very good precision in clustering networks with similar structure. The significance of our method is supported by the fact that such high precision cannot be achieved when performing clustering based on simpler topological properties, such as degree, assortativity, and eigenvector distributions. We have also tested our techniques using swap randomization.