External memory K-bisimulation reduction of big graphs

  • Authors:
  • Yongming Luo;George H.L. Fletcher;Jan Hidders;Yuqing Wu;Paul De Bra

  • Affiliations:
  • Eindhoven University of Technology, Eindhoven, Netherlands;Eindhoven University of Technology, Eindhoven, Netherlands;Delft University of Technology, Delft, Netherlands;Indiana University Bloomington, Bloomington, IN, USA;Eindhoven University of Technology, Eindhoven, Netherlands

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort}(|Et|) + k · scan(|Nt|) + sort(|Nt|)), while our maintenance algorithms are bounded by O(k · sort}(|Et|) + k · scan(|Nt|). The space complexity bounds are O(|Nt|+|Et|)$ and O(k · |Nt|+k ·|Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.