GConnect: a connectivity index for massive disk-resident graphs

Authors:
Charu Aggarwal;Yan Xie;Philip S. Yu
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, NY;University of Illinois at Chicago, Chicago, IL;University of Illinois at Chicago, Chicago, IL
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 17
Cited 6

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Random Sampling in Cut, Flow, and Network Design Problems

Mathematics of Operations Research
The Web as a graph

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Compressing the Graph Structure of the Web

DCC '01 Proceedings of the Data Compression Conference
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Graph Data

Mining Graph Data
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Graph indexing based on discriminative frequent structure analysis

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Dual Labeling: Answering Graph Reachability Queries in Constant Time

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Xproj: a framework for projected structural clustering of xml documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph summarization with bounded error

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

A compact representation of graph databases

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
GBASE: a scalable and general graph management system

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Relational approach for shortest path discovery over large graphs

Proceedings of the VLDB Endowment
Optimizing K2 trees: A case for validating the maturity of network of practices

Computers & Mathematics with Applications
A framework for SQL-Based mining of large graphs on relational databases

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
gbase: an efficient analysis platform for large graphs

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of connectivity is an extremely important one in the context of massive graphs. In many large communication networks, social networks and other graphs, it is desirable to determine the minimum-cut between any pair of nodes. The problem is well solved in the classical literature, since it is related to the maximum-flow problem, which is efficiently solvable. However, large graphs may often be disk resident, and such graphs cannot be efficiently processed for connectivity queries. This is because the minimum-cut problem is typically solved with the use of a variety of combinatorial and flow-based techniques which require random access to the underlying edges in the graph. In this paper, we propose to develop a connectivity index for massive-disk resident graphs. We will use an edge-sampling based approach to create compressed representations of the underlying graphs. Since these compressed representations can be held in main memory, they can be used to derive efficient approximations for the minimum-cut problem. These compressed representations are then organized into a disk-resident index structure. We present experimental results which show that the resulting approach provides between two and three orders of magnitude more efficient query processing than a disk-resident approach at the expense of a small amount of accuracy.