High quality, scalable and parallel community detection for large real graphs

Authors:
Arnau Prat-Pérez;David Dominguez-Sal;Josep-Lluis Larriba-Pey
Affiliations:
DAMA-UPC, Universitat Politècnica de Catalunya, Barcelona, Spain;Sparsity Technologies, Barcelona, Spain;DAMA-UPC, Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
Proceedings of the 23rd international conference on World wide web
Year:
2014

Citing 5
Cited 0

Community-based greedy algorithm for mining top-K influential nodes in mobile social networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards linear time overlapping community detection in social networks

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Shaping communities out of triangles

Proceedings of the 21st ACM international conference on Information and knowledge management
Overlapping community detection at scale: a nonnegative matrix factorization approach

Proceedings of the sixth ACM international conference on Web search and data mining
Overlapping community detection in networks: The state-of-the-art and comparative study

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Community detection has arisen as one of the most relevant topics in the field of graph mining, principally for its applications in domains such as social or biological networks analysis. Different community detection algorithms have been proposed during the last decade, approaching the problem from different perspectives. However, existing algorithms are, in general, based on complex and expensive computations, making them unsuitable for large graphs with millions of vertices and edges such as those usually found in the real world. In this paper, we propose a novel disjoint community detection algorithm called Scalable Community Detection (SCD). By combining different strategies, SCD partitions the graph by maximizing the Weighted Community Clustering (WCC), a recently proposed community detection metric based on triangle analysis. Using real graphs with ground truth overlapped communities, we show that SCD outperforms the current state of the art proposals (even those aimed at finding overlapping communities) in terms of quality and performance. SCD provides the speed of the fastest algorithms and the quality in terms of NMI and F1Score of the most accurate state of the art proposals. We show that SCD is able to run up to two orders of magnitude faster than practical existing solutions by exploiting the parallelism of current multi-core processors, enabling us to process graphs of unprecedented size in short execution times.