External memory K-bisimulation reduction of big graphs

Authors:
Yongming Luo;George H.L. Fletcher;Jan Hidders;Yuqing Wu;Paul De Bra
Affiliations:
Eindhoven University of Technology, Eindhoven, Netherlands;Eindhoven University of Technology, Eindhoven, Netherlands;Delft University of Technology, Delft, Netherlands;Indiana University Bloomington, Bloomington, IN, USA;Eindhoven University of Technology, Eindhoven, Netherlands
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 28
Cited 0

Three partition refinement algorithms

SIAM Journal on Computing
The input/output complexity of sorting and related problems

Communications of the ACM
On sorting strings in external memory (extended abstract)

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Parallel Algorithms for Relational Coarsest Partition Problems

IEEE Transactions on Parallel and Distributed Systems
The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient processing of joins on set-valued attributes

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
From Bisimulation to Simulation: Coarsest Partition Problems

Journal of Automated Reasoning
An efficient algorithm for computing bisimulation equivalence

Theoretical Computer Science
Incremental maintenance of XML structural indexes

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Distributed state space minimization

International Journal on Software Tools for Technology Transfer (STTT) - Special section on formal methods for industrial critical systems
Path queries on compressed XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
STXXL: standard template library for XXL data sets

Software—Practice & Experience
Efficient aggregation for graph summarization

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A methodology for coupling fragments of XPath with structural indexes for XML documents

Information Systems
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Linked Data

Linked Data
Advanced Topics in Bisimulation and Coinduction

Advanced Topics in Bisimulation and Coinduction
Query preserving graph compression

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Efficient external-memory bisimulation on DAGs

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Graph pattern matching revised for social network analysis

Proceedings of the 15th International Conference on Database Theory
A structural approach to indexing triples

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
GraphChi: large-scale graph computation on just a PC

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Regularities and dynamics in bisimulation reductions of big graphs

First International Workshop on Graph Data Management Experiences and Systems
Bisimulation reduction of big graphs on mapreduce

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort}(|Et|) + k · scan(|Nt|) + sort(|Nt|)), while our maintenance algorithms are bounded by O(k · sort}(|Et|) + k · scan(|Nt|). The space complexity bounds are O(|Nt|+|Et|)$ and O(k · |Nt|+k ·|Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.