A distributed algorithm for community detection in large graphs

  • Authors:
  • Harris Papadakis;Costas Panagiotakis;Paraskevi Fragopoulou

  • Affiliations:
  • Technological Educational Institute of Crete, Crete, Greece;Technological Educational Institute of Crete, Crete, Greece;Technological Educational Institute of Crete, Crete, Greece

  • Venue:
  • Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Networks in various application domains present an internal structure, where nodes form groups of tightly connected components which are more loosely connected to the rest of the network. Several attempts have been made to provide a formal definition to the generally described "community finding" concept, providing different approaches. Some algorithms follow an iterative approach starting by characterizing either the entire network, or each individual node as community, and splitting [1] or merging communities respectively, producing a hierarchical tree of nested communities, called dendrogram. Several researchers aim to find the entire hierarchical community dendrogram [1] while others wish to identify only the optimal community partition. Some researchers aim at discovering distinct (non-overlapping) communities, while others allow for overlaps [2]. The Blondel algorithm described by Blondel et al. in [3], follows a bottom-up approach. Each node in the graph comprises a singleton community. Two communities are merged into one if the resulting community has larger modularity value that both the initial ones. This is a fast and accurate algorithm which, detects all communities in the graph. In suffers however in the sense that it constantly, during its execution, requires the knowledge of a global information of the graph, namely the number of its edges (which change as the algorithm modifies the graph), limiting its distributed nature. The Infomap algorithm [4] transforms the problem of community detection into efficiently compressing the structure of the graph, so that one can recovered almost the entire structure from the compressed form. This is achieved by minimizing a function that expresses the tradeoff between compression factor and loss of information (difference between the original graph and the reconstructed graph).