GConnect: a connectivity index for massive disk-resident graphs

  • Authors:
  • Charu Aggarwal;Yan Xie;Philip S. Yu

  • Affiliations:
  • IBM T.J. Watson Research Center, Hawthorne, NY;University of Illinois at Chicago, Chicago, IL;University of Illinois at Chicago, Chicago, IL

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of connectivity is an extremely important one in the context of massive graphs. In many large communication networks, social networks and other graphs, it is desirable to determine the minimum-cut between any pair of nodes. The problem is well solved in the classical literature, since it is related to the maximum-flow problem, which is efficiently solvable. However, large graphs may often be disk resident, and such graphs cannot be efficiently processed for connectivity queries. This is because the minimum-cut problem is typically solved with the use of a variety of combinatorial and flow-based techniques which require random access to the underlying edges in the graph. In this paper, we propose to develop a connectivity index for massive-disk resident graphs. We will use an edge-sampling based approach to create compressed representations of the underlying graphs. Since these compressed representations can be held in main memory, they can be used to derive efficient approximations for the minimum-cut problem. These compressed representations are then organized into a disk-resident index structure. We present experimental results which show that the resulting approach provides between two and three orders of magnitude more efficient query processing than a disk-resident approach at the expense of a small amount of accuracy.